Leaderboard
Multi-dimensional rankings based on model speed tests and provider health checks. Compare providers, endpoints, and reliability at a glance.
Average tokens generated per second. Higher is better for fast responses.
| Rank | Provider | Model | Throughput | Avg first token latency | Total Tests |
|---|---|---|---|---|---|
| 1 | jimmy | 89773.31 t/s Best: 145658.50Worst: 13204.57 | 0.63s | 10 | |
| 2 | jimmy | 76101.33 t/s Best: 138352.88Worst: 15563.41 | 0.60s | 15 | |
| 3 |
| gpt-5-high |
62438.11 t/s Best: 123417.05Worst: 182.61 |
14.27s |
| 5 |
| 4 | minimax/minimax-m2.5:free | 42367.74 t/s Best: 110596.73Worst: 123.85 | 15.41s | 5 |
| 5 | kilo-auto/free | 42116.89 t/s Best: 141870.06Worst: 99.41 | 14.36s | 10 |
| 6 | claude-opus-4-6 | 35523.93 t/s Best: 97038.07Worst: 40.92 | 10.21s | 10 |
| 7 | claude-opus-4-6 | 35283.10 t/s Best: 73526.11Worst: 0.00 | 9.02s | 10 |
| 8 | kilo/minimax-m2.5 | 18859.22 t/s Best: 31447.63Worst: 45.10 | 9.38s | 5 |
| 9 | llama3.1-8B | 17820.14 t/s Best: 41610.30Worst: 1523.25 | 0.42s | 85 |
| 10 | llama3.1-8B | 17820.14 t/s Best: 41610.30Worst: 1523.25 | 0.42s | 85 |
| 11 | translate-model | 15990.56 t/s Best: 48227.39Worst: 161.10 | 1.25s | 10 |
| 12 | llama3.1-8B | 14956.18 t/s Best: 23028.86Worst: 4292.03 | 1.60s | 15 |
| 13 | claude-opus-4-6 | 14565.00 t/s Best: 25852.70Worst: 41.37 | 7.95s | 10 |
| 14 | claude-sonnet-4-6 | 13044.40 t/s Best: 39948.79Worst: 0.00 | 5.86s | 10 |
| 15 | llama3.1-8B | 12425.64 t/s Best: 170331.58Worst: 335.19 | 0.43s | 240 |
| 16 | claude-sonnet-4-5-20250929 | 12401.37 t/s Best: 27448.31Worst: 0.00 | 7.46s | 20 |
| 17 | anthropic/claude-sonnet-4.6 | 11506.04 t/s Best: 44409.41Worst: 36.75 | 3.12s | 10 |
| 18 | grok-imagine-1.0-fast | 4998.02 t/s Best: 7933.91Worst: 1462.69 | 4.80s | 15 |
| 19 | anthropic/claude-sonnet-4.6 | 4641.90 t/s Best: 25206.38Worst: 39.10 | 3.45s | 10 |
| 20 | claude-sonnet-4-5-20250929 | 4552.11 t/s Best: 23856.21Worst: 38.64 | 3.17s | 10 |
| 21 | claude-haiku-4-5-20251001 | 3615.46 t/s Best: 17553.74Worst: 153.66 | 4.65s | 10 |
| 22 | llama3.1-8b | 2301.73 t/s Best: 2860.23Worst: 1596.67 | 0.38s | 10 |
| 23 | llama3.1-8b | 1910.78 t/s Best: 2242.04Worst: 1101.22 | 0.43s | 5 |
| 24 | gpt-oss-120b | 1467.36 t/s Best: 1785.19Worst: 1053.27 | 0.82s | 5 |
| 25 | claude-sonnet-4-20250514 | 1414.76 t/s Best: 2664.12Worst: 764.55 | 2.70s | 10 |
| 26 | claude-sonnet-4-5-20250929 | 1336.95 t/s Best: 1962.60Worst: 838.68 | 4.09s | 5 |
| 27 | gpt-oss-120b | 1319.02 t/s Best: 2371.72Worst: 640.36 | 0.61s | 10 |
| 28 | kimi-k2.5 | 1268.36 t/s Best: 2318.76Worst: 489.30 | 2.12s | 35 |
| 29 | gemini-2.5-flash-lite-preview-09-2025-thinking | 1220.53 t/s Best: 2646.27Worst: 359.25 | 7.20s | 5 |
| 30 | claude-3-5-sonnet-20241022 | 1068.52 t/s Best: 2033.08Worst: 461.35 | 2.74s | 10 |
| 31 | qwen-3-235b-a22b-instruct-2507 | 910.44 t/s Best: 1417.04Worst: 592.53 | 0.50s | 5 |
| 32 | qwen-3-235b | 878.97 t/s Best: 1762.38Worst: 523.26 | 1.62s | 5 |
| 33 | openai/gpt-oss-safeguard-20b | 869.93 t/s Best: 1144.67Worst: 637.27 | 0.67s | 5 |
| 34 | claude-3-5-sonnet-20241022 | 839.26 t/s Best: 1803.13Worst: 338.25 | 2.68s | 5 |
| 35 | nvidia/llama-3.1-nemoguard-8b-content-safety | 628.67 t/s Best: 1583.25Worst: 7.99 | 1.03s | 5 |
| 36 | gpt-oss-safeguard-20b | 610.97 t/s Best: 953.66Worst: 394.19 | 0.63s | 5 |
| 37 | qwen-opus:latest | 522.69 t/s Best: 2309.67Worst: 49.88 | 7.59s | 5 |
| 38 | 酒馆-Teller | 509.57 t/s Best: 963.76Worst: 323.85 | 14.47s | 5 |
| 39 | gpt-5 | 490.00 t/s Best: 719.34Worst: 118.57 | 13.34s | 10 |
| 40 | gpt-5-codex | 465.66 t/s Best: 793.38Worst: 175.13 | 3.30s | 5 |
| 41 | gemini-3.1-pro-preview | 462.04 t/s Best: 1827.55Worst: 82.45 | 23.27s | 5 |
| 42 | qwen2.5:1.5b | 435.32 t/s Best: 480.86Worst: 273.41 | 1.63s | 5 |
| 43 | qwen2.5:1.5b | 435.32 t/s Best: 480.86Worst: 273.41 | 1.63s | 5 |
| 44 | qwen2.5:1.5b | 435.32 t/s Best: 480.86Worst: 273.41 | 1.63s | 5 |
| 45 | CPA/gpt-5.4-mini | 430.22 t/s Best: 858.15Worst: 167.01 | 3.41s | 5 |
| 46 | gpt-5.2-codex | 404.70 t/s Best: 1706.85Worst: 53.60 | 4.71s | 5 |
| 47 | gpt-5-codex | 402.13 t/s Best: 592.56Worst: 149.00 | 2.22s | 5 |
| 48 | gpt-5.4-mini | 382.82 t/s Best: 727.09Worst: 129.16 | 4.49s | 5 |
| 49 | qwen3.5-0.8b | 354.52 t/s Best: 369.51Worst: 319.73 | 0.82s | 10 |
| 50 | qwen3.5-0.8b | 354.52 t/s Best: 369.51Worst: 319.73 | 0.82s | 10 |