Leaderboard
Multi-dimensional rankings based on model speed tests and provider health checks. Compare providers, endpoints, and reliability at a glance.
Average time to first token. Lower is better for responsiveness.
| Rank | Provider | Model | First Token Latency | Avg tokens per second | Total Tests |
|---|---|---|---|---|---|
| 1 | google/gemma-2-27b-it | 0.22 s Best: 0.19Worst: 0.35 | 43.48t/s | 5 | |
| 2 | deepseek-v3.1 | 0.30 s Best: 0.27Worst: 0.34 | 57.51t/s | 5 |
| 3 | G GPT Loadgpt-load.shiho.top | llama3.1-8b | 0.35 s Best: 0.30Worst: 0.47 | 2191.20t/s | 10 |
| 4 | G GPT Loadgpt-load.shiho.top | qwen-3-coder-480b | 0.35 s Best: 0.33Worst: 0.38 | 894.38t/s | 5 |
| 5 | G GPT Loadgpt-load.shiho.top | llama-4-scout-17b-16e-instruct | 0.36 s Best: 0.31Worst: 0.46 | 1372.80t/s | 5 |
| 6 | 心流apis.iflow.cn | qwen3-235b-a22b-instruct | 0.39 s Best: 0.36Worst: 0.45 | 24.36t/s | 10 |
| 7 | SophNetwww.sophnet.com | DeepSeek-V3-Fast | 0.39 s Best: 0.29Worst: 0.55 | 86.92t/s | 5 |
| 8 | G GPT Loadgpt-load.shiho.top | qwen-3-32b | 0.40 s Best: 0.34Worst: 0.43 | 705.04t/s | 5 |
| 9 | SophNetwww.sophnet.com | DeepSeek-V3.1-Fast | 0.40 s Best: 0.28Worst: 0.70 | 146.54t/s | 15 |
| 10 | G GPT Loadgpt-load.shiho.top | llama-4-maverick-17b-128e-instruct | 0.41 s Best: 0.37Worst: 0.52 | 1052.78t/s | 5 |
| 11 | G GPT Loadgpt-load.shiho.top | qwen-3-235b-a22b-thinking-2507 | 0.44 s Best: 0.37Worst: 0.57 | 579.82t/s | 5 |
| 12 | w wxstudio.thuarchdog.com:60089wxstudio.thuarchdog.com:60089 | DeepSeek-R1-Distill-Qwen-32B-AWQ | 0.44 s Best: 0.30Worst: 0.83 | 40.05t/s | 10 |
| 13 | w wxstudio.thuarchdog.com:60089wxstudio.thuarchdog.com:60089 | DeepSeek-R1-Distill-Qwen-32B-AWQ | 0.44 s Best: 0.30Worst: 0.83 | 40.05t/s | 10 |
| 14 | W Wxstudiowxstudio.thuarchdog.com:60089 | DeepSeek-R1-Distill-Qwen-32B-AWQ | 0.44 s Best: 0.30Worst: 0.83 | 40.05t/s | 10 |
| 15 | A AI Toolsplatform.aitools.cfd | google/gemini-2.0-flash-exp | 0.45 s Best: -Worst: 2.89 | 28.22t/s | 30 |
| 16 | TokenPonyapi.tokenpony.cn | qwen3-next-80b-a3b-instruct | 0.45 s Best: 0.30Worst: 0.86 | 164.04t/s | 5 |
| 17 | G GPT Loadgpt-load.shiho.top | qwen-3-235b-a22b-instruct-2507 | 0.45 s Best: 0.32Worst: 0.85 | 754.92t/s | 5 |
| 18 | W Wxstudiowxstudio.thuarchdog.com:60089 | Qwen3-8B | 0.47 s Best: 0.26Worst: 1.32 | 55.80t/s | 10 |
| 19 | w wxstudio.thuarchdog.com:60089wxstudio.thuarchdog.com:60089 | Qwen3-8B | 0.47 s Best: 0.26Worst: 1.32 | 55.80t/s | 10 |
| 20 | w wxstudio.thuarchdog.com:60089wxstudio.thuarchdog.com:60089 | Qwen3-8B | 0.47 s Best: 0.26Worst: 1.32 | 55.80t/s | 10 |
| 21 | 心流apis.iflow.cn | deepseek-v3.1 | 0.48 s Best: 0.40Worst: 0.56 | 25.60t/s | 5 |
| 22 | L LLM.PMapi-proxy.me | moonshotai/kimi-k2-instruct | 0.49 s Best: 0.38Worst: 0.59 | 149.26t/s | 5 |
| 23 | A AI Toolsplatform.aitools.cfd | zhipu/glm-4v-flash | 0.49 s Best: 0.31Worst: 0.82 | 51.60t/s | 5 |
| 24 | W Wxstudiowxstudio.thuarchdog.com:60089 | Qwen2.5-7B-Instruct | 0.50 s Best: 0.29Worst: 0.82 | 44.65t/s | 5 |
| 25 | w wxstudio.thuarchdog.com:60089wxstudio.thuarchdog.com:60089 | Qwen2.5-7B-Instruct | 0.50 s Best: 0.29Worst: 0.82 | 44.65t/s | 5 |
| 26 | w wxstudio.thuarchdog.com:60089wxstudio.thuarchdog.com:60089 | Qwen2.5-7B-Instruct | 0.50 s Best: 0.29Worst: 0.82 | 44.65t/s | 5 |
| 27 | x.aiapi.x.ai | grok-4-fast-non-reasoning | 0.51 s Best: 0.48Worst: 0.54 | 151.11t/s | 5 |
| 28 | SiliconFlowapi.siliconflow.cn | tencent/Hunyuan-MT-7B | 0.51 s Best: 0.48Worst: 0.54 | 88.25t/s | 5 |
| 29 | G GPT Loadgpt-load.shiho.top | llama-3.3-70b | 0.51 s Best: 0.37Worst: 0.89 | 1062.69t/s | 5 |
| 30 | Atlas Cloudapi.atlascloud.ai | deepseek-ai/DeepSeek-V3.1-Terminus | 0.54 s Best: 0.43Worst: 0.78 | 60.47t/s | 5 |
| 31 | G GPT Loadallaiload.dpdns.org | DeepSeek-V3.1 | 0.55 s Best: 0.47Worst: 0.84 | 257.63t/s | 5 |
| 32 | G GPT Loadallaiload.dpdns.org | models/gemini-2.5-flash-preview-09-2025 | 0.57 s Best: 0.54Worst: 0.62 | 175.68t/s | 5 |
| 33 | AIHubMixaihubmix.com | gemini-2.0-flash | 0.59 s Best: 0.48Worst: 0.82 | 163.32t/s | 5 |
| 34 | integrate.api.nvidia.comintegrate.api.nvidia.com | moonshotai/kimi-k2-instruct | 0.59 s Best: 0.41Worst: 0.79 | 62.37t/s | 10 |
| 35 | G GPT Loadallaiload.dpdns.org | qwen-3-235b-a22b-instruct-2507 | 0.60 s Best: 0.36Worst: 1.15 | 724.96t/s | 5 |
| 36 | SiliconFlowapi.siliconflow.cn | Qwen/Qwen3-Next-80B-A3B-Instruct | 0.62 s Best: 0.60Worst: 0.67 | 105.32t/s | 5 |
| 37 | Fireworks AIapi.fireworks.ai | accounts/fireworks/models/qwen3-235b-a22b-instruct-2507 | 0.62 s Best: 0.48Worst: 0.93 | 78.64t/s | 5 |
| 38 | SiliconFlowapi.siliconflow.cn | Qwen/Qwen2.5-72B-Instruct | 0.66 s Best: 0.55Worst: 0.91 | 33.41t/s | 15 |
| 39 | integrate.api.nvidia.comintegrate.api.nvidia.com | qwen/qwen3-next-80b-a3b-instruct | 0.67 s Best: 0.38Worst: 1.32 | 79.54t/s | 5 |
| 40 | G GPT Loadgpt-load.shiho.top | gpt-oss-120b | 0.70 s Best: 0.55Worst: 1.01 | 846.32t/s | 5 |
| 41 | integrate.api.nvidia.comintegrate.api.nvidia.com | moonshotai/kimi-k2-instruct-0905 | 0.72 s Best: 0.43Worst: 1.27 | 47.28t/s | 5 |
| 42 | A AI Toolsplatform.aitools.cfd | moonshotai/kimi-k2 | 0.73 s Best: -Worst: 5.45 | 14.56t/s | 35 |
| 43 | Lido LLMnew-api.shiho.top | ai.dev/gemini-2.5-flash-lite | 0.78 s Best: 0.65Worst: 1.21 | 405.65t/s | 5 |
| 44 | g gmi-servingapi.gmi-serving.com | deepseek-ai/DeepSeek-V3-0324 | 0.85 s Best: 0.39Worst: 1.81 | 35.64t/s | 5 |
| 45 | G Gpt Load M9swgpt-load-m9sw.onrender.com | tencent/Hunyuan-MT-7B | 0.85 s Best: 0.70Worst: 0.97 | 83.16t/s | 5 |
| 46 | AIHubMixaihubmix.com | DeepSeek-V3-Fast | 0.89 s Best: 0.70Worst: 1.54 | 79.64t/s | 5 |
| 47 | Tencentapi.lkeap.cloud.tencent.com | deepseek-v3-0324 | 0.89 s Best: 0.65Worst: 1.04 | 23.29t/s | 5 |
| 48 | Fireworks AIapi.fireworks.ai | accounts/fireworks/models/deepseek-v3p1-terminus | 0.89 s Best: 0.51Worst: 2.04 | 112.67t/s | 5 |
| 49 | A AI Toolsplatform.aitools.cfd | zhipu/glm-4-flash | 0.89 s Best: 0.38Worst: 5.29 | 32.59t/s | 1010 |
| 50 | L LLM.PMapi-proxy.me | grok-4-fast-non-reasoning | 0.90 s Best: 0.86Worst: 0.94 | 167.62t/s | 5 |