Leaderboard
Multi-dimensional rankings based on model speed tests and provider health checks. Compare providers, endpoints, and reliability at a glance.
Average tokens generated per second. Higher is better for fast responses.
| Rank | Provider | Model | Throughput | Avg first token latency | Total Tests |
|---|---|---|---|---|---|
| 1 | QWEN | 26447.65 t/s Best: 29225.31Worst: 22615.66 | 2.28s | 5 | |
| 2 | auto_chat | 26226.28 t/s Best: 29397.79Worst: 21527.43 | 2.31s | 10 | |
| 3 |
| auto_chat |
26226.28 t/s Best: 29397.79Worst: 21527.43 |
2.31s |
| 10 |
| 4 | QWEN | 25767.73 t/s Best: 28486.04Worst: 22422.67 | 2.34s | 5 |
| 5 | QWEN | 25767.73 t/s Best: 28486.04Worst: 22422.67 | 2.34s | 5 |
| 6 | QWEN | 25767.73 t/s Best: 28486.04Worst: 22422.67 | 2.34s | 5 |
| 7 | QWEN | 24489.11 t/s Best: 28226.09Worst: 22088.69 | 2.40s | 5 |
| 8 | QWEN | 24489.11 t/s Best: 28226.09Worst: 22088.69 | 2.40s | 5 |
| 9 | QWEN | 23330.45 t/s Best: 28540.31Worst: 16419.09 | 2.51s | 5 |
| 10 | QWEN | 23330.45 t/s Best: 28540.31Worst: 16419.09 | 2.51s | 5 |
| 11 | QWEN | 21961.01 t/s Best: 26829.29Worst: 8729.13 | 2.49s | 10 |
| 12 | QWEN | 21961.01 t/s Best: 26829.29Worst: 8729.13 | 2.49s | 10 |
| 13 | SCQwen3 | 2399.97 t/s Best: 7173.18Worst: 51.52 | 9.80s | 5 |
| 14 | llama3.1-8b | 2191.20 t/s Best: 2380.02Worst: 1972.50 | 0.35s | 10 |
| 15 | CEREBRAS | 1473.52 t/s Best: 1618.04Worst: 1197.58 | 3.54s | 5 |
| 16 | CEREBRAS | 1473.52 t/s Best: 1618.04Worst: 1197.58 | 3.54s | 5 |
| 17 | llama-4-scout-17b-16e-instruct | 1372.80 t/s Best: 2337.90Worst: 1013.16 | 0.36s | 5 |
| 18 | llama-3.3-70b | 1062.69 t/s Best: 1189.55Worst: 947.98 | 0.51s | 5 |
| 19 | llama-4-maverick-17b-128e-instruct | 1052.78 t/s Best: 1316.79Worst: 830.13 | 0.41s | 5 |
| 20 | qwen-3-coder-480b | 894.38 t/s Best: 1231.29Worst: 476.56 | 0.35s | 5 |
| 21 | gemini-2.0-flash-lite | 853.10 t/s Best: 1952.68Worst: 189.38 | 4.85s | 5 |
| 22 | gpt-oss-120b | 846.32 t/s Best: 1592.38Worst: 529.76 | 0.70s | 5 |
| 23 | qwen-3-235b-a22b-instruct-2507 | 754.92 t/s Best: 1013.76Worst: 548.86 | 0.45s | 5 |
| 24 | qwen-3-235b-a22b-instruct-2507 | 724.96 t/s Best: 1015.33Worst: 536.96 | 0.60s | 5 |
| 25 | qwen-3-32b | 705.04 t/s Best: 831.33Worst: 564.46 | 0.40s | 5 |
| 26 | asi1-extended | 681.48 t/s Best: 849.76Worst: 559.82 | 2.18s | 5 |
| 27 | qwen-3-235b-a22b-thinking-2507 | 579.82 t/s Best: 709.20Worst: 438.58 | 0.44s | 5 |
| 28 | ai.dev/gemini-2.5-flash-lite | 405.65 t/s Best: 498.23Worst: 329.15 | 0.78s | 5 |
| 29 | accounts/fireworks/models/gpt-oss-20b | 359.18 t/s Best: 380.03Worst: 345.85 | 1.14s | 10 |
| 30 | asi1-fast | 339.86 t/s Best: 767.56Worst: 255.53 | 9.12s | 10 |
| 31 | gcli/gemini-2.5-flash | 332.90 t/s Best: 589.48Worst: 218.35 | 8.61s | 5 |
| 32 | openai/gpt-5-nano | 321.87 t/s Best: 464.65Worst: 145.16 | 19.88s | 5 |
| 33 | gemini-2.0-flash-001 | 282.75 t/s Best: 340.86Worst: 173.41 | 2.70s | 5 |
| 34 | DeepSeek-V3.1 | 257.63 t/s Best: 277.42Worst: 237.94 | 0.55s | 5 |
| 35 | x-ai/grok-4-fast:free | 233.02 t/s Best: 297.76Worst: 137.48 | 2.32s | 5 |
| 36 | MBZUAI-IFM/K2-Think | 230.74 t/s Best: 300.12Worst: 197.21 | 2.12s | 10 |
| 37 | gemini-2.5-flash | 221.56 t/s Best: 290.68Worst: 155.34 | 9.25s | 5 |
| 38 | openrouter/sonoma-sky-alpha | 219.22 t/s Best: 259.93Worst: 171.77 | 2.18s | 5 |
| 39 | models/gemini-2.5-flash | 180.81 t/s Best: 201.06Worst: 158.47 | 7.98s | 5 |
| 40 | gemini-2.0-flash | 178.57 t/s Best: 333.67Worst: 42.35 | 21.33s | 15 |
| 41 | models/gemini-2.5-flash-preview-09-2025 | 175.68 t/s Best: 205.12Worst: 152.49 | 0.57s | 5 |
| 42 | o3-mini | 174.13 t/s Best: 195.02Worst: 152.08 | 3.83s | 5 |
| 43 | grok-4-fast-non-reasoning | 167.62 t/s Best: 187.89Worst: 151.31 | 0.90s | 5 |
| 44 | openai/gpt-oss-20b:free | 165.41 t/s Best: 174.99Worst: 151.41 | 3.35s | 5 |
| 45 | qwen3-next-80b-a3b-instruct | 164.04 t/s Best: 202.21Worst: 119.94 | 0.45s | 5 |
| 46 | gemini-2.0-flash | 163.32 t/s Best: 186.75Worst: 131.84 | 0.59s | 5 |
| 47 | grok-3-mini | 154.69 t/s Best: 170.00Worst: 145.81 | 3.10s | 5 |
| 48 | grok-4-fast-non-reasoning | 151.11 t/s Best: 168.48Worst: 130.03 | 0.51s | 5 |
| 49 | qwen3-8b | 150.65 t/s Best: 157.12Worst: 140.46 | 5.56s | 5 |
| 50 | moonshotai/kimi-k2-instruct | 149.26 t/s Best: 187.53Worst: 105.96 | 0.49s | 5 |