Leaderboard
Multi-dimensional rankings based on model speed tests and provider health checks. Compare providers, endpoints, and reliability at a glance.
Average tokens generated per second. Higher is better for fast responses.
| Rank | Provider | Model | Throughput | Avg first token latency | Total Tests |
|---|---|---|---|---|---|
| 1 | gemini-2.5-flash-preview-04-17-thinking | 203658.15 t/s Best: 242039.75Worst: 124802.05 | 16.10s | 5 | |
| 2 | spark-desk-lite | 28750.49 t/s Best: 32770.86Worst: 26708.17 | 7.10s | 5 |
| 3 | gemini-2.0-flash | 12254.31 t/s Best: 39232.11Worst: 602.41 | 9.63s | 10 |
| 4 | deepseek_7b | 4773.40 t/s Best: 4820.50Worst: 4721.69 | 0.45s | 5 |
| 5 | opengvlab/internvl3-2b:free | 379.88 t/s Best: 397.99Worst: 365.98 | 1.22s | 5 |
| 6 | deepseek/deepseek-v3-0324 | 377.75 t/s Best: 1701.32Worst: 43.06 | 3.44s | 5 |
| 7 | inception/mercury-coder-small-beta | 331.42 t/s Best: 402.56Worst: 235.89 | 0.72s | 5 |
| 8 | qwen3:0.6b | 239.34 t/s Best: 291.06Worst: 223.84 | 0.40s | 5 |
| 9 | qwen3:0.6b | 239.34 t/s Best: 291.06Worst: 223.84 | 0.40s | 5 |
| 10 | gemini-2.0-flash | 175.37 t/s Best: 193.92Worst: 154.56 | 2.48s | 5 |
| 11 | gpt-4.1-nano-2025-04-14 | 152.16 t/s Best: 163.61Worst: 135.66 | 0.82s | 5 |
| 12 | gemini-2.0-flash-lite-preview-02-05 | 146.97 t/s Best: 175.97Worst: 124.49 | 0.81s | 5 |
| 13 | google/gemini-2.5-flash-preview | 140.85 t/s Best: 169.07Worst: 114.25 | 2.05s | 10 |
| 14 | hunyuan-lite | 133.10 t/s Best: 138.69Worst: 120.73 | 1.04s | 5 |
| 15 | fradser/deeptranslate-r2-4b:latest | 123.70 t/s Best: 137.94Worst: 103.24 | 0.78s | 10 |
| 16 | fradser/deeptranslate-r2-4b:latest | 123.70 t/s Best: 137.94Worst: 103.24 | 0.78s | 10 |
| 17 | fradser/deeptranslate-r2-4b:latest | 123.70 t/s Best: 137.94Worst: 103.24 | 0.78s | 10 |
| 18 | meta-llama/Llama-4-Scout-17B-16E-Instruct | 118.72 t/s Best: 124.82Worst: 113.03 | 1.04s | 5 |
| 19 | qwen3:30b-a3b | 116.49 t/s Best: 119.21Worst: 114.63 | 1.27s | 5 |
| 20 | qwen3:30b-a3b | 116.49 t/s Best: 119.21Worst: 114.63 | 1.27s | 5 |
| 21 | deepseek-ai/DeepSeek-R1-0528-Qwen3-8B | 105.14 t/s Best: 112.88Worst: 91.29 | 5.48s | 5 |
| 22 | gpt-4o-2024-05-13 | 100.94 t/s Best: 129.87Worst: 91.03 | 0.49s | 5 |
| 23 | qwen3:30b-a3b | 100.03 t/s Best: 123.38Worst: 90.81 | 1.98s | 5 |
| 24 | qwen3:30b-a3b | 100.03 t/s Best: 123.38Worst: 90.81 | 1.98s | 5 |
| 25 | gemini-2.0-pro-exp | 93.92 t/s Best: 115.98Worst: 73.88 | 17.47s | 5 |
| 26 | qwen3:30b-a3b | 93.34 t/s Best: 94.85Worst: 91.33 | 1.58s | 10 |
| 27 | qwen3:30b-a3b | 93.34 t/s Best: 94.85Worst: 91.33 | 1.58s | 10 |
| 28 | grok-3-fast-beta | 89.18 t/s Best: 116.86Worst: 74.71 | 0.75s | 5 |
| 29 | qwen3:30b-a3b-q8_0 | 84.32 t/s Best: 85.38Worst: 83.28 | 0.55s | 5 |
| 30 | qwen3:30b-a3b-q8_0 | 84.32 t/s Best: 85.38Worst: 83.28 | 0.55s | 5 |
| 31 | qwen/qwen3-30b-a3b:free | 83.19 t/s Best: 184.42Worst: 13.71 | 22.71s | 5 |
| 32 | unsloth/qwen3:30b-a3b-q8_0 | 82.71 t/s Best: 83.33Worst: 81.38 | 2.21s | 5 |
| 33 | unsloth/qwen3:30b-a3b-q8_0 | 82.71 t/s Best: 83.33Worst: 81.38 | 2.21s | 5 |
| 34 | grok-3-mini-beta | 81.33 t/s Best: 113.35Worst: 61.88 | 6.02s | 5 |
| 35 | deepseek-ai/DeepSeek-Prover-V2-671B | 80.53 t/s Best: 83.94Worst: 75.75 | 1.05s | 10 |
| 36 | gpt-4.1-nano | 78.77 t/s Best: 103.70Worst: 45.31 | 2.19s | 10 |
| 37 | o4-mini | 78.69 t/s Best: 99.58Worst: 38.23 | 3.39s | 5 |
| 38 | deepseek-ai/DeepSeek-V3-0324 | 74.08 t/s Best: 78.55Worst: 68.87 | 1.08s | 5 |
| 39 | /root/models/Qwen/Qwen3-4B | 72.15 t/s Best: 72.69Worst: 71.44 | 0.56s | 5 |
| 40 | /root/models/Qwen/Qwen3-4B | 72.15 t/s Best: 72.69Worst: 71.44 | 0.56s | 5 |
| 41 | deepseek-ai/DeepSeek-R1 | 71.38 t/s Best: 78.88Worst: 66.31 | 11.96s | 5 |
| 42 | QwQ-32B | 69.72 t/s Best: 70.16Worst: 68.98 | 14.39s | 5 |
| 43 | Qwen/Qwen3-30B-A3B | 69.20 t/s Best: 136.72Worst: 21.05 | 13.40s | 15 |
| 44 | Qwen/Qwen3-235B-A22B-FP8 | 66.11 t/s Best: 68.83Worst: 59.67 | 14.31s | 5 |
| 45 | Qwen/QwQ-32B | 63.70 t/s Best: 73.47Worst: 55.60 | 23.60s | 5 |
| 46 | unsloth/qwen3:14b-q8_0 | 61.79 t/s Best: 63.00Worst: 60.77 | 1.51s | 5 |
| 47 | unsloth/qwen3:14b-q8_0 | 61.79 t/s Best: 63.00Worst: 60.77 | 1.51s | 5 |
| 48 | qwen3:30b | 61.59 t/s Best: 66.59Worst: 41.92 | 0.77s | 10 |
| 49 | glm-4-flash-250414 | 58.60 t/s Best: 69.21Worst: 46.31 | 0.31s | 10 |
| 50 | zhipu/glm-4v-flash | 57.09 t/s Best: 73.22Worst: 31.85 | 1.11s | 5 |