Leaderboard
Multi-dimensional rankings based on model speed tests and provider health checks. Compare providers, endpoints, and reliability at a glance.
Average time to first token. Lower is better for responsiveness.
| Rank | Provider | Model | First Token Latency | Avg tokens per second | Total Tests |
|---|---|---|---|---|---|
| 1 | google/gemma-3-27b | 0.11 s Best: -Worst: 5.26 | 2.59t/s | 105 | |
| 2 | zhipu/glm-4-9b | 0.18 s Best: -Worst: 2.10 | 17.89t/s | 75 |
| 3 | google/gemma-3-27b-it | 0.21 s Best: 0.17Worst: 0.29 | 52.40t/s | 5 |
| 4 | meta/llama-4-maverick-17b-128e-instruct | 0.21 s Best: 0.17Worst: 0.32 | 100.41t/s | 15 |
| 5 | nvidia/llama-3.1-nemotron-ultra-253b-v1 | 0.21 s Best: 0.18Worst: 0.30 | 45.22t/s | 5 |
| 6 | tencent/Hunyuan-MT-7B | 0.24 s Best: 0.21Worst: 0.28 | 56.34t/s | 5 |
| 7 | qwen/qwen3-8b | 0.28 s Best: -Worst: 0.94 | 9.92t/s | 10 |
| 8 | llama3.1-8b | 0.38 s Best: 0.34Worst: 0.43 | 2301.73t/s | 10 |
| 9 | zai-org/GLM-4.5-Air | 0.39 s Best: 0.36Worst: 0.41 | 85.68t/s | 5 |
| 10 | llama3.1-8B | 0.42 s Best: 0.36Worst: 1.09 | 17820.14t/s | 85 |
| 11 | llama3.1-8B | 0.42 s Best: 0.36Worst: 1.09 | 17820.14t/s | 85 |
| 12 | llama3.1-8B | 0.43 s Best: 0.35Worst: 1.07 | 12425.64t/s | 240 |
| 13 | llama3.1-8b | 0.43 s Best: 0.34Worst: 0.73 | 1910.78t/s | 5 |
| 14 | marin/marin-8b-instruct | 0.44 s Best: 0.40Worst: 0.56 | 84.25t/s | 5 |
| 15 | openai/gpt-oss-120b | 0.45 s Best: -Worst: 1.04 | 85.28t/s | 10 |
| 16 | [A]-claude-opus-4-6 | 0.45 s Best: 0.41Worst: 0.90 | 0.00t/s | 60 |
| 17 | [A]-claude-opus-4-6 | 0.45 s Best: 0.41Worst: 0.90 | 0.00t/s | 60 |
| 18 | microsoft/phi-3-medium-128k-instruct | 0.46 s Best: 0.43Worst: 0.56 | 17.84t/s | 5 |
| 19 | Qwen/Qwen3-Omni-30B-A3B-Instruct | 0.47 s Best: 0.33Worst: 0.96 | 128.77t/s | 5 |
| 20 | meta/llama3-70b-instruct | 0.47 s Best: 0.41Worst: 0.56 | 37.53t/s | 5 |
| 21 | nvidia/llama-3.1-nemoguard-8b-topic-control | 0.47 s Best: 0.44Worst: 0.63 | 57.46t/s | 10 |
| 22 | meta/llama-4-maverick-17b-128e-instruct | 0.48 s Best: 0.39Worst: 0.56 | 131.83t/s | 5 |
| 23 | institute-of-science-tokyo/llama-3.1-swallow-70b-instruct-v0.1 | 0.48 s Best: 0.44Worst: 0.59 | 19.02t/s | 5 |
| 24 | zhipu/glm-4v-flash | 0.48 s Best: 0.33Worst: 1.24 | 55.82t/s | 50 |
| 25 | abacusai/dracarys-llama-3.1-70b-instruct | 0.50 s Best: 0.45Worst: 0.62 | 18.53t/s | 10 |
| 26 | qwen-3-235b-a22b-instruct-2507 | 0.50 s Best: 0.45Worst: 0.58 | 910.44t/s | 5 |
| 27 | DeepSeek3.1-Terminus | 0.51 s Best: 0.36Worst: 0.95 | 26.11t/s | 5 |
| 28 | zhipu/glm-4-9b | 0.51 s Best: 0.38Worst: 0.89 | 47.66t/s | 15 |
| 29 | openai/gpt-oss-120b | 0.52 s Best: 0.30Worst: 0.86 | 201.68t/s | 20 |
| 30 | glm-4-flash-250414 | 0.55 s Best: 0.39Worst: 0.71 | 43.74t/s | 5 |
| 31 | mistralai/mistral-small-4-119b-2603 | 0.55 s Best: 0.51Worst: 0.69 | 221.17t/s | 5 |
| 32 | Qwen/Qwen3-8B | 0.55 s Best: 0.41Worst: 0.74 | 22.98t/s | 10 |
| 33 | qwen/qwen3-next-80b-a3b-instruct | 0.55 s Best: 0.52Worst: 0.59 | 115.65t/s | 5 |
| 34 | Translate-Fast | 0.56 s Best: 0.50Worst: 0.66 | 95.51t/s | 5 |
| 35 | grok-4-thinking | 0.56 s Best: 0.50Worst: 0.63 | 53.71t/s | 5 |
| 36 | moonshotai/kimi-k2-instruct | 0.56 s Best: 0.39Worst: 0.85 | 65.79t/s | 5 |
| 37 | google/gemma-3-27b-it | 0.57 s Best: 0.52Worst: 0.85 | 53.73t/s | 10 |
| 38 | qwen3vl | 0.58 s Best: 0.43Worst: 1.09 | 19.33t/s | 5 |
| 39 | Qwen/qwen3.5-0.8b | 0.59 s Best: 0.46Worst: 1.02 | 241.83t/s | 5 |
| 40 | Qwen/qwen3.5-0.8b | 0.59 s Best: 0.46Worst: 1.02 | 241.83t/s | 5 |
| 41 | Qwen/qwen3.5-0.8b | 0.59 s Best: 0.46Worst: 1.02 | 241.83t/s | 5 |
| 42 | jimmy | 0.60 s Best: 0.42Worst: 1.18 | 76101.33t/s | 15 |
| 43 | qwen3-max-preview | 0.60 s Best: 0.54Worst: 0.68 | 74.20t/s | 5 |
| 44 | gpt-oss-120b | 0.61 s Best: 0.35Worst: 1.52 | 1319.02t/s | 10 |
| 45 | gpt-oss-safeguard-20b | 0.63 s Best: 0.43Worst: 0.85 | 610.97t/s | 5 |
| 46 | jimmy | 0.63 s Best: 0.42Worst: 1.30 | 89773.31t/s | 10 |
| 47 | openai/gpt-oss-20b | 0.64 s Best: 0.54Worst: 0.90 | 0.00t/s | 5 |
| 48 | google/gemini-2.0-flash-exp | 0.65 s Best: 0.53Worst: 1.01 | 0.00t/s | 5 |
| 49 | zhipu/glm-4v-flash | 0.66 s Best: 0.36Worst: 1.31 | 47.95t/s | 5 |
| 50 | deepseek-v3 | 0.67 s Best: 0.42Worst: 1.02 | 13.43t/s | 5 |