Discover free LLM API models across providers with real speed and latency benchmarks.
| Model | Providers | Speed | Latency | Tests |
|---|---|---|---|---|
AnthropicFree ReasoningToolsFilesVisionAnthropic Claude Sonnet 4.6 extends the Sonnet line with improved tool use, coding reliability, and long-context performance for everyday production workloads. | 30.62 t/s | 4.12 s | 10 | |
AnthropicFree ToolsMultimodal200K工具调用Anthropic Claude Haiku 4.5 delivers fast, low-cost responses while retaining solid instruction following for chat, classification, and lightweight coding. |
LMSpeed tracks 456 LLM models available for free across 59 API providers. Free tiers vary by provider — some offer limited daily requests, others provide free credits for new users. All speed data is from real API tests.
Find a free model, compare its providers, then open the provider or model detail page before you start testing.
Use search, model-family filters, and capability tags to narrow the directory to the free models you actually want to try.
Check provider count, benchmark volume, throughput, and first-token latency so you are not choosing from price alone.
Review the provider page for availability, health checks, pricing notes, and any free-tier limits before building on it.
Common questions about free model tiers and how to use this directory.
| 139.49 t/s |
| 12.44 s |
| 5 |
ReasoningTools200K工具调用Zhipu GLM-5.1 is a next-generation GLM model aimed at frontier reasoning, coding, and bilingual agent applications. | 36.98 t/s | 20.15 s | 5 |
ReasoningToolsFilesVisionZhipu AI GLM-4.6V Flash is a multimodal vision-language model in the GLM series, supporting both text and image understanding. | N/A | N/A | 0 |
工具调用Reasoning开源200KZhipu GLM-4.7 is a flagship GLM release from Zhipu AI with advanced Chinese-English reasoning, coding, and agent features. | N/A | N/A | 0 |
ReasoningTools200KZhipu AI GLM-4.7 Flash is a fast and efficient language model in the GLM series, optimized for quick responses and high throughput. | N/A | N/A | 0 |
工具调用Reasoning开源200KZhipu GLM-5 is Zhipu flagship GLM series model with enhanced reasoning, agent capabilities, and strong performance on Chinese enterprise and coding scenarios. | N/A | N/A | 0 |
工具调用Reasoning200KZhipu AI GLM-5 Turbo is a fast and efficient language model in the GLM series, optimized for quick responses and high throughput. | N/A | N/A | 0 |
工具调用200KAlibaba Qwen3 Max is the largest language model in the Qwen series, offering advanced reasoning, code generation, and multimodal capabilities. | N/A | N/A | 0 |
工具调用Reasoning开源200KMiniMax M2.7 is a large language model in the MiniMax series, offering advanced reasoning, code generation, and multimodal capabilities. | 89.67 t/s | 2.72 s | 5 |
Reasoning开源200K工具调用MiniMax M2.7 HighSpeed is a fast and efficient language model in the MiniMax series, optimized for quick responses and high throughput. | 50.42 t/s | 2.18 s | 5 |
openrouter/freeOpenRouterFree ReasoningToolsFilesVisionFree Models Router is an AI model provided by openrouter. | N/A | N/A | 0 |
kat-coder-pro-v2KuaishouFree 工具调用结构化输出200KKAT-Coder-Pro V2 是快手 KAT 系列 KAT-Coder 的最新高性能模型,专为复杂企业级软件工程和 SaaS 集成设计。它在早期版本的 agent 编码能力基础上进一步强化。 | N/A | N/A | 0 |
kilo-auto/freeKilo GatewayFree 工具调用Reasoning200KKilo Auto Free 是由 kilo 提供的 AI 模型。 | N/A | N/A | 0 |
A model–provider pair whose current input and output prices are both $0 per token. Some providers offer permanent free tiers, others only give one-time credits to new accounts. We mark an offering as free only while its public pricing is zero, and re-check pricing pages regularly.
Most have rate limits — requests per minute, daily caps, or context-length limits — and many require account signup with a verified phone or payment method. Some are time-limited promotions. Always read the provider's terms and quotas before depending on a free endpoint.
Some entries are community-run relays ("公益站") that bundle paid upstream keys and redistribute access for free at the operator's expense. They often advertise larger quotas and a broader model list than official free tiers, but reliability is much lower: operators can pull the plug or disappear overnight, pricing and quotas can change without notice, and many sites are invite-only — requiring a GitHub invite, a forum referral, or a closed community to register. Some keep signups disabled indefinitely. Treat them as best-effort backup channels; keep anything important on official paid endpoints.
Speed varies by model and provider. Sort the table by Most tested for the most reliable benchmarks, or pick a model family from the chips above to drill in. Each row shows median tokens per second and first-token latency from real API tests.
We send identical prompts to each provider through a five-round stress test, count output tokens with tiktoken, and measure both throughput (tokens per second) and time to first token. Numbers are aggregated as medians to resist outliers and refresh on a regular cadence.
For prototypes, side projects, and low-traffic tools, yes. Production traffic will usually hit a rate limit quickly. Treat the free tier as an evaluation channel: validate the model and provider, then move to a paid endpoint with the same model when you scale.
Either no provider currently offers it for free, the free promotion ended, or it has not been benchmarked yet. Open the model's main page to compare paid options, or let us know about a missing free provider via the feedback link in the footer.