LogoLMSpeed
  • Home
  • Free
  • Providers
  • Docs
LogoLMSpeed
LogoLMSpeed

The best API speed test tool

GitHubGitHubTwitterX (Twitter)Email
Product
  • Features
  • Pricing
  • FAQ
Leaderboard
  • Overview
  • Speed Ranking
  • Latency Ranking
  • Health Ranking
Models
  • All Models
  • GPT
  • Claude
  • Gemini
  • DeepSeek
  • Llama
  • Qwen
Free Models
  • All Free Models
  • Free GPT
  • Free Claude
  • Free Gemini
  • Free DeepSeek
  • Free Llama
  • Free Qwen
Resources
  • Speed Test
  • Provider Directory
  • Documentation
Legal
  • Cookie Policy
  • Privacy Policy
  • Terms of Service
© 2026 LMSpeed All Rights Reserved.Made by Nexmoe with ❤️
首页交流 QQ 群:1034193296,欢迎中转站站长加入讨论 AI 最热话题、newapi、openclaw 等,获取最新测速动态与反馈支持。
NVIDIA NIM logo

NVIDIA NIM

NVIDIA NIM provides optimized AI model inference APIs for LLMs, vision, and embedding models through NVIDIA cloud infrastructure.

Categories

中转站
OpenAIGPT-OSSQwenQwen3 Next InstructQwenQwen3 Next ThinkingMetaAILlama 4 Maverick 128e InstructDeepSeekDeepSeek R1MinimaxMiniMax-M2.1GeminiStep 3.5 FlashMoonshotAIKimi K2.5MinimaxMiniMax-M2.5MetaAILlama 3.3 Nemotron Super V1.5GemmaGemma 3 ItChatGLMGlm4 7QwenQwen3 Coder InstructMetaAILlama 3.1 Nemotron InstructMetaAILlama 3.1 InstructMoonshotAIKimi K2 InstructMetaAILlama 3.1 Nemotron Ultra v1MoonshotAIKimi K2 ThinkingGemmaGemma 2 ItQwenQwen3 5QwenDeepSeek R1 Distill QwenDeepSeekDeepSeek V3.1MetaAILlama3 InstructMistralMistral Small InstructQwenQwen3ChatGLMGlm5MetaAILlama 3.1 Swallow Instruct V0.1MetaAIDracarys Llama 3 1 InstructDeepSeekDeepSeek V3.2MetaAILlama 3.2 Vision Instruct

NVIDIA NIM offers 42 LLM API models.

Speed benchmark average: 63 tok/s.

NVIDIA NIM is an API aggregator, offering models from multiple vendors.

NVIDIA NIM interface preview
Avg Speed63.48 tok/s
Latency6.53 s
Total Tests650
Models42
Updated4/16/2026
Created At8/13/2025
Website

API Endpoints

  • Historical / Unverified
    https://www.nvidia.com
  • Historical / Unverified
    https://integrate.api.nvidia.com

Supported Models

ModelSpeedLatencyTests
OpenAIopenai/gpt-oss-20b
200.28 tok/s
10.88s
5
OpenAIopenai/gpt-oss-120b
169.75 tok/s
6.57s
60
Qwenqwen/qwen3-next-80b-a3b-instruct
118.96 tok/s
0.58s
10
Qwenqwen/qwen3-next-80b-a3b-thinking
105.70 tok/s
10.51s
10
MetaAImeta/llama-4-maverick-17b-128e-instruct
100.41 tok/s
0.21s
15
Mistralmistralai/mixtral-8x22b-instruct-v0.1
89.66 tok/s
0.22s
5
DeepSeekdeepseek-ai/deepseek-r1
87.63 tok/s
8.96s
15
Minimaxminimaxai/minimax-m2.1
86.36 tok/s
2.88s
30
marin/marin-8b-instruct
84.25 tok/s
0.44s
5
Geministepfun-ai/step-3.5-flash
81.82 tok/s
4.79s
10
Geminimicrosoft/phi-4-mini-flash-reasoning
74.19 tok/s
0.46s
5
MoonshotAImoonshotai/kimi-k2.5
70.81 tok/s
8.11s
10
Minimaxminimaxai/minimax-m2.5
66.85 tok/s
3.83s
30
MetaAInvidia/llama-3.3-nemotron-super-49b-v1.5
57.55 tok/s
11.11s
10
Gemmagoogle/gemma-3-27b-it
57.40 tok/s
0.20s
10
ChatGLMz-ai/glm4.7
56.96 tok/s
27.72s
25
ai21labs/jamba-1.5-large-instruct
55.60 tok/s
0.29s
10
stockmark/stockmark-2-100b-instruct
55.31 tok/s
0.74s
5
Qwenqwen/qwen3-coder-480b-a35b-instruct
55.07 tok/s
1.32s
25
MetaAInvidia/llama-3.1-nemotron-70b-instruct
52.19 tok/s
0.23s
5
Showing 20 of 42 models.

Leaderboard Rankings

Speed
201.8 tokens/s#9/100
Latency
0.19 s#1/100
OverviewPerformance42PricingTests650HealthEmbed