Leaderboard
Multi-dimensional rankings based on model speed tests and provider health checks. Compare providers, endpoints, and reliability at a glance.
Average time to first token. Lower is better for responsiveness.
| Rank | Provider | Model | First Token Latency | Avg tokens per second | Total Tests |
|---|---|---|---|---|---|
| 1 | google/gemma-3-27b-it | 0.20 s Best: 0.15Worst: 0.29 | 62.41t/s | 5 | |
| 2 | 01-ai/yi-large | 0.22 s Best: 0.19Worst: 0.32 | 43.74t/s | 5 |
| 3 | integrate.api.nvidia.comintegrate.api.nvidia.com | mistralai/mixtral-8x22b-instruct-v0.1 | 0.22 s Best: 0.18Worst: 0.33 | 89.66t/s | 5 |
| 4 | integrate.api.nvidia.comintegrate.api.nvidia.com | nvidia/llama-3.1-nemotron-70b-instruct | 0.23 s Best: 0.16Worst: 0.40 | 52.19t/s | 5 |
| 5 | integrate.api.nvidia.comintegrate.api.nvidia.com | meta/llama-3.1-70b-instruct | 0.23 s Best: 0.17Worst: 0.40 | 51.18t/s | 5 |
| 6 | integrate.api.nvidia.comintegrate.api.nvidia.com | google/gemma-2-27b-it | 0.24 s Best: 0.21Worst: 0.33 | 43.90t/s | 5 |
| 7 | 智谱AI开放平台open.bigmodel.cn | glm-z1-flash | 0.25 s Best: 0.20Worst: 0.33 | 133.79t/s | 5 |
| 8 | New APIoneapi.352287.xyz | allam-2-7b | 0.27 s Best: 0.23Worst: 0.30 | 337.15t/s | 5 |
| 9 | integrate.api.nvidia.comintegrate.api.nvidia.com | ai21labs/jamba-1.5-large-instruct | 0.29 s Best: 0.23Worst: 0.49 | 55.60t/s | 10 |
| 10 | l llm.imerji.cnllm.imerji.cn | Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8 | 0.36 s Best: 0.31Worst: 0.54 | 89.43t/s | 5 |
| 11 | A AI Toolsplatform.aitools.cfd | google/gemini-2.0-flash-exp | 0.39 s Best: -Worst: 3.58 | 26.25t/s | 70 |
| 12 | integrate.api.nvidia.comintegrate.api.nvidia.com | moonshotai/kimi-k2-instruct | 0.40 s Best: 0.32Worst: 0.57 | 38.32t/s | 5 |
| 13 | 智谱AI开放平台open.bigmodel.cn | glm-4-flash | 0.44 s Best: 0.29Worst: 0.76 | 34.60t/s | 10 |
| 14 | Mistral AIapi.mistral.ai | mistral-medium-latest | 0.45 s Best: 0.38Worst: 0.64 | 62.82t/s | 5 |
| 15 | Mistral AIapi.mistral.ai | mistral-medium-latest | 0.45 s Best: 0.38Worst: 0.64 | 62.82t/s | 5 |
| 16 | integrate.api.nvidia.comintegrate.api.nvidia.com | microsoft/phi-4-mini-flash-reasoning | 0.46 s Best: 0.32Worst: 0.94 | 74.19t/s | 5 |
| 17 | A API使用指南api.openai-proxy.org | gpt-4.1-2025-04-14 | 0.48 s Best: 0.43Worst: 0.53 | 102.12t/s | 5 |
| 18 | integrate.api.nvidia.comintegrate.api.nvidia.com | mistralai/mistral-small-24b-instruct | 0.49 s Best: 0.36Worst: 0.99 | 29.68t/s | 10 |
| 19 | Yuegleapi.yuegle.com | gemini-2.5-flash-lite | 0.52 s Best: 0.48Worst: 0.55 | 335.31t/s | 5 |
| 20 | Mineai.081007.xyz | command-a-03-2025 | 0.52 s Best: 0.42Worst: 0.87 | 120.65t/s | 5 |
| 21 | integrate.api.nvidia.comintegrate.api.nvidia.com | microsoft/phi-3-medium-128k-instruct | 0.53 s Best: 0.39Worst: 1.02 | 18.27t/s | 5 |
| 22 | SiliconFlowapi.siliconflow.cn | Qwen/Qwen2-7B-Instruct | 0.53 s Best: 0.49Worst: 0.56 | 63.29t/s | 5 |
| 23 | A AI Toolsplatform.aitools.cfd | qwen/qwen3-coder | 0.56 s Best: -Worst: 3.12 | 12.98t/s | 25 |
| 24 | SiliconFlowapi.siliconflow.cn | THUDM/glm-4-9b-chat | 0.56 s Best: 0.52Worst: 0.68 | 77.52t/s | 5 |
| 25 | integrate.api.nvidia.comintegrate.api.nvidia.com | deepseek-ai/deepseek-r1-distill-qwen-32b | 0.59 s Best: 0.43Worst: 1.21 | 33.97t/s | 5 |
| 26 | Yuegleapi.yuegle.com | gemini-2.5-flash-lite-preview-06-17 | 0.61 s Best: 0.56Worst: 0.67 | 401.84t/s | 5 |
| 27 | 算 算了么 APIapi.suanli.cn | QwQ-32B | 0.63 s Best: 0.59Worst: 0.70 | 33.29t/s | 5 |
| 28 | A AI Toolsplatform.aitools.cfd | zhipu/glm-4v-flash | 0.68 s Best: 0.29Worst: 2.12 | 54.39t/s | 5 |
| 29 | A Awa1api.awa1.fun | qwen3-coder-30b-a3b-instruct | 0.69 s Best: 0.50Worst: 1.16 | 112.76t/s | 5 |
| 30 | Yuegleapi.yuegle.com | Qwen/Qwen3-235B-A22B-Instruct-2507-FP8 | 0.70 s Best: 0.63Worst: 0.74 | 63.17t/s | 5 |
| 31 | A AI Toolsplatform.aitools.cfd | zhipu/glm-4-flash | 0.72 s Best: 0.40Worst: 24.43 | 35.72t/s | 1320 |
| 32 | Undy APIvip.undyingapi.com | gpt-5-chat | 0.74 s Best: 0.62Worst: 0.85 | 131.48t/s | 5 |
| 33 | SiliconFlowapi.siliconflow.cn | Qwen/Qwen3-235B-A22B-Instruct-2507 | 0.75 s Best: 0.65Worst: 0.86 | 15.82t/s | 5 |
| 34 | SophNetwww.sophnet.com | DeepSeek-v3 | 0.78 s Best: 0.32Worst: 1.62 | 35.13t/s | 20 |
| 35 | SiliconFlowapi.siliconflow.cn | deepseek-ai/DeepSeek-V2.5 | 0.79 s Best: 0.69Worst: 0.89 | 15.00t/s | 5 |
| 36 | surtext.pollinations.ai | evil | 0.81 s Best: 0.27Worst: 2.58 | 1475.97t/s | 25 |
| 37 | Cotton APIgemini.nkbpal.cn | gemini-2.5-flash-nothinking | 0.82 s Best: 0.65Worst: 1.25 | 174.04t/s | 5 |
| 38 | A AI Toolsplatform.aitools.cfd | moonshotai/kimi-k2 | 0.82 s Best: -Worst: 3.53 | 10.61t/s | 60 |
| 39 | New APItbai.xin | gpt-4.1-nano | 0.82 s Best: 0.52Worst: 1.12 | 132.11t/s | 5 |
| 40 | DashScopedashscope.aliyuncs.com | tongyi-intent-detect-v3 | 0.86 s Best: 0.60Worst: 1.78 | 91.73t/s | 5 |
| 41 | A AI Toolsplatform.aitools.cfd | qwen/qwen2.5-7b | 0.87 s Best: 0.67Worst: 0.98 | 29.15t/s | 5 |
| 42 | A AI Toolsplatform.aitools.cfd | zhipu/glm-4-9b | 0.87 s Best: 0.69Worst: 1.26 | 71.74t/s | 5 |
| 43 | 天翼云wishub-x1.ctyun.cn | DeepSeek-R1-昇腾版 | 0.88 s Best: 0.57Worst: 1.20 | 19.41t/s | 5 |
| 44 | Hugging Facerouter.huggingface.co | openai/gpt-oss-120b | 0.92 s Best: 0.65Worst: 1.12 | 225.67t/s | 5 |
| 45 | Hugging Facerouter.huggingface.co | openai/gpt-oss-120b | 0.92 s Best: 0.65Worst: 1.12 | 225.67t/s | 5 |
| 46 | ChatAnywhereapi.chatanywhere.tech | gpt-5-chat-latest | 0.95 s Best: 0.78Worst: 1.30 | 113.81t/s | 10 |
| 47 | DashScopedashscope.aliyuncs.com | qwen-plus-latest | 0.95 s Best: 0.67Worst: 1.84 | 30.78t/s | 10 |
| 48 | SiliconFlowapi.siliconflow.cn | zai-org/GLM-4.5-Air | 0.95 s Best: 0.81Worst: 1.08 | 26.04t/s | 5 |
| 49 | Cotton APIgemini.nkbpal.cn | gpt-4.1-nano-2025-04-14 | 0.95 s Best: 0.58Worst: 1.50 | 104.17t/s | 5 |
| 50 | 共绩算力d08011731-minicpm4-8blatest-2824-9z9f7zk2-11434.550c.cloud | minicpm4-8b:latest | 0.96 s Best: 0.40Worst: 3.18 | 159.84t/s | 5 |