Leaderboard
Model performance rankings based on speed test results. Compare models across different providers and endpoints.
Average tokens generated per second. Higher is better for fast responses.
| Rank | Provider | Model | Throughput | Avg first token latency | Total Tests |
|---|---|---|---|---|---|
| 1 | zai-glm-4.7 | 454.25 t/s Best: 609.14Worst: 295.71 | 3.57s | 5 | |
| 2 | meta-llama/Llama-3.3-70B-Instruct | 416.19 t/s Best: 538.29Worst: 336.89 | 0.26s | 5 | |
| 3 |
Hugging Facerouter.huggingface.co |
| meta-llama/Llama-3.3-70B-Instruct |
416.19 t/s Best: 538.29Worst: 336.89 |
0.26s |
| 5 |
| 4 | s sd.rnglg2.top:30000sd.rnglg2.top:30000 | gpt-oss-120b-medium | 329.97 t/s Best: 378.13Worst: 266.67 | 2.58s | 5 |
| 5 | s sd.rnglg2.top:30000sd.rnglg2.top:30000 | gpt-5-codex-mini | 327.07 t/s Best: 407.63Worst: 157.66 | 3.35s | 5 |
| 6 | s sd.rnglg2.top:30000sd.rnglg2.top:30000 | gpt-5.1-codex-mini | 248.33 t/s Best: 265.97Worst: 214.97 | 1.91s | 5 |
| 7 | 酒馆无限制免费APIapi2.aoyou.shop | 酒馆-Flash-Long | 194.62 t/s Best: 212.42Worst: 179.41 | 1.75s | 5 |
| 8 | Fireworks AIapi.fireworks.ai | accounts/fireworks/models/minimax-m2p1 | 185.81 t/s Best: 216.27Worst: 154.65 | 1.91s | 10 |
| 9 | Fireworks AIapi.fireworks.ai | accounts/fireworks/models/gpt-oss-120b | 144.71 t/s Best: 179.95Worst: 127.43 | 0.79s | 10 |
| 10 | DashScopedashscope.aliyuncs.com | qwen-flash | 142.89 t/s Best: 158.26Worst: 128.60 | 0.51s | 5 |
| 11 | A AI Toolsplatform.aitools.cfd | zhipu/glm-4.6v-flash | 134.33 t/s Best: 240.96Worst: 87.08 | 12.98s | 5 |
| 12 | Fireworks AIapi.fireworks.ai | accounts/fireworks/models/kimi-k2-thinking | 127.75 t/s Best: 151.65Worst: 83.90 | 3.47s | 5 |
| 13 | Fireworks AIapi.fireworks.ai | accounts/fireworks/models/deepseek-v3p2 | 114.61 t/s Best: 142.32Worst: 94.19 | 0.49s | 5 |
| 14 | Fireworks AIapi.fireworks.ai | accounts/fireworks/models/glm-4p7 | 107.96 t/s Best: 156.99Worst: 52.92 | 10.19s | 15 |
| 15 | Fireworks AIapi.fireworks.ai | accounts/fireworks/models/glm-4p7 | 98.48 t/s Best: 134.12Worst: 61.42 | 11.45s | 5 |
| 16 | A AI Toolsplatform.aitools.cfd | qwen/qwen2.5-7b | 96.94 t/s Best: 102.36Worst: 81.76 | 0.80s | 5 |
| 17 | g gpt.cosmoplat.comgpt.cosmoplat.com | qwen3-235b-fp8 | 94.11 t/s Best: 99.14Worst: 76.87 | 0.56s | 5 |
| 18 | A AI Toolsplatform.aitools.cfd | qwen/qwen2.5-7b | 86.57 t/s Best: 105.08Worst: 53.78 | 2.66s | 10 |
| 19 | DashScopedashscope.aliyuncs.com | tongyi-intent-detect-v3 | 83.35 t/s Best: 86.59Worst: 79.67 | 0.57s | 5 |
| 20 | integrate.api.nvidia.comintegrate.api.nvidia.com | z-ai/glm4.7 | 79.03 t/s Best: 134.51Worst: 34.59 | 22.47s | 15 |
| 21 | a api.siliconflow.comapi.siliconflow.com | zai-org/GLM-4.7 | 74.72 t/s Best: 78.41Worst: 70.62 | 16.10s | 5 |
| 22 | SiliconFlowapi.siliconflow.cn | Pro/zai-org/GLM-4.7 | 73.01 t/s Best: 76.77Worst: 69.39 | 17.14s | 5 |
| 23 | SiliconFlowapi.siliconflow.cn | Pro/zai-org/GLM-4.7 | 72.42 t/s Best: 76.59Worst: 67.54 | 17.92s | 5 |
| 24 | a arkark.cn-beijing.volces.com | doubao-seed-1-6-flash-250828 | 69.65 t/s Best: 81.25Worst: 58.41 | 18.69s | 5 |
| 25 | Fireworks AIapi.fireworks.ai | accounts/fireworks/models/glm-4p6 | 69.26 t/s Best: 105.48Worst: 0.00 | 10.03s | 5 |
| 26 | DashScopedashscope.aliyuncs.com | qwen3-235b-a22b-thinking-2507 | 68.48 t/s Best: 79.59Worst: 52.70 | 8.90s | 5 |
| 27 | DashScopedashscope.aliyuncs.com | tongyi-xiaomi-analysis-pro | 65.76 t/s Best: 69.19Worst: 60.09 | 0.64s | 5 |
| 28 | DashScopedashscope.aliyuncs.com | qwen-plus-latest | 57.95 t/s Best: 63.81Worst: 52.43 | 0.69s | 5 |
| 29 | DashScopedashscope.aliyuncs.com | qwen-plus | 57.34 t/s Best: 66.27Worst: 48.55 | 0.66s | 5 |
| 30 | DashScopedashscope.aliyuncs.com | qwen-plus-2025-12-01 | 56.15 t/s Best: 64.05Worst: 51.52 | 0.67s | 5 |
| 31 | A AI Toolsplatform.aitools.cfd | zhipu/glm-4v-flash | 55.76 t/s Best: 66.69Worst: 47.64 | 2.95s | 5 |
| 32 | s sd.rnglg2.top:30000sd.rnglg2.top:30000 | gpt-5.2 | 55.45 t/s Best: 62.04Worst: 45.84 | 1.43s | 5 |
| 33 | Fireworks AIapi.fireworks.ai | accounts/fireworks/models/qwen3-vl-235b-a22b-thinking | 51.54 t/s Best: 57.56Worst: 49.21 | 1.25s | 5 |
| 34 | A AI Toolsplatform.aitools.cfd | zhipu/glm-4.5-flash | 51.13 t/s Best: 56.72Worst: 44.03 | 12.30s | 5 |
| 35 | z zenmux.aizenmux.ai | deepseek/deepseek-v3.2 | 51.09 t/s Best: 60.04Worst: 40.39 | 6.43s | 5 |
| 36 | 算 算了么 APIapi.suanli.cn | Qwen/QwQ-32B | 46.81 t/s Best: 66.67Worst: 35.03 | 18.74s | 5 |
| 37 | 心流apis.iflow.cn | deepseek-v3.2 | 44.90 t/s Best: 50.54Worst: 39.85 | 1.14s | 10 |
| 38 | 算 算了么 APIapi.suanli.cn | Qwen/Qwen3-VL-32B-Thinking | 44.46 t/s Best: 58.24Worst: 30.68 | 14.29s | 5 |
| 39 | SiliconFlowapi.siliconflow.cn | Pro/deepseek-ai/DeepSeek-V3.2 | 44.24 t/s Best: 57.56Worst: 11.81 | 6.05s | 5 |
| 40 | DashScopedashscope.aliyuncs.com | kimi-k2-thinking | 43.87 t/s Best: 45.41Worst: 42.37 | 14.94s | 5 |
| 41 | 智谱AI开放平台open.bigmodel.cn | glm-4.7 | 38.93 t/s Best: 64.40Worst: 19.04 | 24.32s | 5 |
| 42 | DashScopedashscope.aliyuncs.com | deepseek-r1 | 37.34 t/s Best: 45.53Worst: 28.29 | 13.81s | 5 |
| 43 | DashScopedashscope.aliyuncs.com | qwen-max-latest | 37.07 t/s Best: 45.29Worst: 32.21 | 0.79s | 5 |
| 44 | a api.siliconflow.comapi.siliconflow.com | deepseek-ai/DeepSeek-V3.2 | 35.84 t/s Best: 49.95Worst: 5.75 | 18.84s | 5 |
| 45 | Gitee AIai.gitee.com | DeepSeek-V3.2 | 35.11 t/s Best: 41.01Worst: 28.33 | 6.18s | 5 |
| 46 | DashScopedashscope.aliyuncs.com | deepseek-v3.2 | 34.40 t/s Best: 39.13Worst: 29.81 | 0.69s | 10 |
| 47 | DeepSeekapi.deepseek.com | deepseek-reasoner | 34.30 t/s Best: 37.72Worst: 32.37 | 6.77s | 5 |
| 48 | A AI Toolsplatform.aitools.cfd | zhipu/glm-4-flash | 33.72 t/s Best: 37.07Worst: 28.66 | 1.53s | 5 |
| 49 | a arkark.cn-beijing.volces.com | deepseek-v3-2-251201 | 32.39 t/s Best: 37.78Worst: 28.61 | 1.23s | 5 |
| 50 | z zenmux.aizenmux.ai | deepseek/deepseek-reasoner | 30.30 t/s Best: 32.12Worst: 27.88 | 7.02s | 5 |