NVIDIA provides AI and accelerated computing APIs for building, customizing, and deploying multimodal generative AI models.

| Model | Speed | Latency | Tests |
|---|---|---|---|
| openai/gpt-oss-20b | 239.61 t/s | 10.88s | 5 |
| openai/gpt-oss-120b | 157.21 t/s | 10.90s | 35 |
| qwen/qwen3-next-80b-a3b-instruct | 118.96 t/s | 0.58s | 10 |
| qwen/qwen3-next-80b-a3b-instruct | 118.96 t/s | 0.58s | 10 |
| mistralai/mixtral-8x22b-instruct-v0.1 | 89.66 t/s | 0.22s | 5 |
| deepseek-ai/deepseek-r1 | 87.63 t/s | 8.96s | 15 |
| nvidia/llama-3.3-nemotron-super-49b-v1.5 | 79.68 t/s | 0.28s | 5 |
| microsoft/phi-4-mini-flash-reasoning | 74.19 t/s | 0.46s | 5 |
| z-ai/glm4.7 | 66.49 t/s | 27.72s | 25 |
| google/gemma-3-27b-it | 62.41 t/s | 0.20s | 5 |
| ai21labs/jamba-1.5-large-instruct | 55.60 t/s | 0.29s | 10 |
| moonshotai/kimi-k2-instruct | 54.35 t/s | 0.53s | 15 |
| nvidia/llama-3.1-nemotron-70b-instruct | 52.19 t/s | 0.23s | 5 |
| meta/llama-3.1-70b-instruct | 51.18 t/s | 0.23s | 5 |
| moonshotai/kimi-k2-instruct-0905 | 47.28 t/s | 0.72s | 5 |
| 01-ai/yi-large | 43.74 t/s | 0.22s | 5 |
| google/gemma-2-27b-it | 43.69 t/s | 0.23s | 10 |
| deepseek-ai/deepseek-r1-distill-qwen-14b | 40.96 t/s | 0.49s | 5 |
| deepseek-ai/deepseek-v3.1 | 38.10 t/s | 2.50s | 25 |
| deepseek-ai/deepseek-v3.1 | 38.10 t/s | 2.50s | 25 |
| Time | Model | Speed | Latency |
|---|---|---|---|
| Feb 26, 01:17 PM | qwen/qwen3.5-397b-a17b | 23.36 t/s | 0.71s |
| Feb 23, 05:56 PM | openai/gpt-oss-120b | 107.32 t/s | 1.37s |
| Feb 21, 10:46 PM | deepseek-ai/deepseek-v3.2 | 21.47 t/s | 3.09s |
| Feb 4, 07:28 AM | z-ai/glm4.7 | 34.87 t/s | 46.41s |
| Feb 4, 01:33 AM | z-ai/glm4.7 | 60.49 t/s | 24.80s |
| Jan 13, 06:48 AM | z-ai/glm4.7 | 89.00 t/s | 18.23s |
| Jan 13, 02:34 AM | z-ai/glm4.7 | 106.60 t/s | 15.93s |
| Jan 12, 02:08 PM | z-ai/glm4.7 | 41.49 t/s | 33.23s |
| Jan 5, 03:40 PM | moonshotai/kimi-k2-thinking | 19.63 t/s | 36.88s |
| Dec 21, 01:18 PM | Unknown | - | -s |