LogoLMSpeed
  • Home
  • Free
  • Models
  • Providers
  • Docs
LogoLMSpeed
LogoLMSpeed

The best API speed test tool

GitHubGitHubTwitterX (Twitter)Email
Product
  • Features
  • Pricing
  • FAQ
Leaderboard
  • Overview
  • Speed Ranking
  • Latency Ranking
  • Health Ranking
  • Model Pricing
  • Model Speed
  • Reasoning
  • Coding
Models
  • All Models
  • GPT
  • Claude
  • Gemini
  • DeepSeek
  • Llama
  • Qwen
Free Models
  • All Free Models
  • Free GPT
  • Free Claude
  • Free Gemini
  • Free DeepSeek
  • Free Llama
  • Free Qwen
Resources
  • Speed Test
  • Provider Directory
  • Documentation
Legal
  • Cookie Policy
  • Privacy Policy
  • Terms of Service
© 2026 LMSpeed All Rights Reserved.Made by Nexmoe with ❤️
Free modelsFree API keys

Free LLM API Models

Discover free LLM API models across providers with real speed and latency benchmarks.

Free models
456
Providers
59
Free offerings
738
1–14 of 14
ModelProvidersSpeedLatencyTests
ClaudeAnthropicFree
ReasoningToolsFilesVisionAnthropic Claude Sonnet 4.6 extends the Sonnet line with improved tool use, coding reliability, and long-context performance for everyday production workloads.
30.62 t/s4.12 s10
ClaudeAnthropicFree
ToolsMultimodal200K工具调用Anthropic Claude Haiku 4.5 delivers fast, low-cost responses while retaining solid instruction following for chat, classification, and lightweight coding.

LMSpeed tracks 456 LLM models available for free across 59 API providers. Free tiers vary by provider — some offer limited daily requests, others provide free credits for new users. All speed data is from real API tests.

How to use

Find a free model, compare its providers, then open the provider or model detail page before you start testing.

  1. 1

    Filter by model family

    Use search, model-family filters, and capability tags to narrow the directory to the free models you actually want to try.

  2. 2

    Compare speed and latency

    Check provider count, benchmark volume, throughput, and first-token latency so you are not choosing from price alone.

  3. 3

    Open the provider details

    Review the provider page for availability, health checks, pricing notes, and any free-tier limits before building on it.

Free LLM API FAQ

Common questions about free model tiers and how to use this directory.

What counts as a free LLM API on LMSpeed?
我不是AI神小水管 API
139.49 t/s
12.44 s
5
ChatGLMGLM-5.1Zhipu AIFree
ReasoningTools200K工具调用Zhipu GLM-5.1 is a next-generation GLM model aimed at frontier reasoning, coding, and bilingual agent applications.
小水管 APIMapleLeaf API
36.98 t/s20.15 s5
ChatGLMGLM-4.6V FlashZhipu AIFree
ReasoningToolsFilesVisionZhipu AI GLM-4.6V Flash is a multimodal vision-language model in the GLM series, supporting both text and image understanding.
WSocket AI
N/AN/A0
ChatGLMGLM-4.7智谱Free
工具调用Reasoning开源200KZhipu GLM-4.7 is a flagship GLM release from Zhipu AI with advanced Chinese-English reasoning, coding, and agent features.
小水管 API
N/AN/A0
ChatGLMGLM-4.7 FlashZhipu AIFree
ReasoningTools200KZhipu AI GLM-4.7 Flash is a fast and efficient language model in the GLM series, optimized for quick responses and high throughput.
WSocket AI
N/AN/A0
ChatGLMGLM-5Zhipu AIFree
工具调用Reasoning开源200KZhipu GLM-5 is Zhipu flagship GLM series model with enhanced reasoning, agent capabilities, and strong performance on Chinese enterprise and coding scenarios.
小水管 API
N/AN/A0
ChatGLMGLM-5 TurboZhipu AIFree
工具调用Reasoning200KZhipu AI GLM-5 Turbo is a fast and efficient language model in the GLM series, optimized for quick responses and high throughput.
小水管 API
N/AN/A0
QwenQwen3 MaxAlibabaFree
工具调用200KAlibaba Qwen3 Max is the largest language model in the Qwen series, offering advanced reasoning, code generation, and multimodal capabilities.
小水管 API
N/AN/A0
MinimaxMiniMax M2.7MiniMaxFree
工具调用Reasoning开源200KMiniMax M2.7 is a large language model in the MiniMax series, offering advanced reasoning, code generation, and multimodal capabilities.
小水管 API
89.67 t/s2.72 s5
MinimaxMiniMax M2.7 HighSpeedMiniMaxFree
Reasoning开源200K工具调用MiniMax M2.7 HighSpeed is a fast and efficient language model in the MiniMax series, optimized for quick responses and high throughput.
小水管 API
50.42 t/s2.18 s5
OpenAIopenrouter/freeOpenRouterFree
ReasoningToolsFilesVisionFree Models Router is an AI model provided by openrouter.
CM-API 公益站猫羽霖API
N/AN/A0
kat-coder-pro-v2KuaishouFree
工具调用结构化输出200KKAT-Coder-Pro V2 是快手 KAT 系列 KAT-Coder 的最新高性能模型,专为复杂企业级软件工程和 SaaS 集成设计。它在早期版本的 agent 编码能力基础上进一步强化。
小水管 API
N/AN/A0
kilo-auto/freeKilo GatewayFree
工具调用Reasoning200KKilo Auto Free 是由 kilo 提供的 AI 模型。
小水管 API
N/AN/A0

A model–provider pair whose current input and output prices are both $0 per token. Some providers offer permanent free tiers, others only give one-time credits to new accounts. We mark an offering as free only while its public pricing is zero, and re-check pricing pages regularly.

Are these free APIs really free? Any catch?

Most have rate limits — requests per minute, daily caps, or context-length limits — and many require account signup with a verified phone or payment method. Some are time-limited promotions. Always read the provider's terms and quotas before depending on a free endpoint.

Are community-run relays and non-profit aggregators included? Any extra caveats?

Some entries are community-run relays ("公益站") that bundle paid upstream keys and redistribute access for free at the operator's expense. They often advertise larger quotas and a broader model list than official free tiers, but reliability is much lower: operators can pull the plug or disappear overnight, pricing and quotas can change without notice, and many sites are invite-only — requiring a GitHub invite, a forum referral, or a closed community to register. Some keep signups disabled indefinitely. Treat them as best-effort backup channels; keep anything important on official paid endpoints.

Which free LLM API is fastest?

Speed varies by model and provider. Sort the table by Most tested for the most reliable benchmarks, or pick a model family from the chips above to drill in. Each row shows median tokens per second and first-token latency from real API tests.

How does LMSpeed measure speed and latency?

We send identical prompts to each provider through a five-round stress test, count output tokens with tiktoken, and measure both throughput (tokens per second) and time to first token. Numbers are aggregated as medians to resist outliers and refresh on a regular cadence.

Can I use a free LLM API in production?

For prototypes, side projects, and low-traffic tools, yes. Production traffic will usually hit a rate limit quickly. Treat the free tier as an evaluation channel: validate the model and provider, then move to a paid endpoint with the same model when you scale.

Why don't I see a specific model in this list?

Either no provider currently offers it for free, the free promotion ended, or it has not been benchmarked yet. Open the model's main page to compare paid options, or let us know about a missing free provider via the feedback link in the footer.

Claude Sonnet 4.6
Seamee API
小水管 API
Claude Haiku 4.5