NVIDIA提供用于构建、定制和部署多模态生成式AI模型的AI和加速计算API。
| 模型 | 速度 | 延迟 | 测试数 |
|---|---|---|---|
| openai/gpt-oss-20b | 239.61 t/s | 10.88s | 5 |
| openai/gpt-oss-120b | 165.53 t/s | 12.48s | 30 |
| qwen/qwen3-next-80b-a3b-instruct | 118.96 t/s | 0.58s | 10 |
| qwen/qwen3-next-80b-a3b-instruct | 118.96 t/s | 0.58s | 10 |
| mistralai/mixtral-8x22b-instruct-v0.1 | 89.66 t/s | 0.22s | 5 |
| deepseek-ai/deepseek-r1 | 87.63 t/s | 8.96s | 15 |
| nvidia/llama-3.3-nemotron-super-49b-v1.5 | 79.68 t/s | 0.28s | 5 |
| z-ai/glm4.7 | 79.03 t/s | 22.47s | 15 |
| microsoft/phi-4-mini-flash-reasoning | 74.19 t/s | 0.46s | 5 |
| google/gemma-3-27b-it | 62.41 t/s | 0.20s | 5 |
| ai21labs/jamba-1.5-large-instruct | 55.60 t/s | 0.29s | 10 |
| moonshotai/kimi-k2-instruct | 54.35 t/s | 0.53s | 15 |
| nvidia/llama-3.1-nemotron-70b-instruct | 52.19 t/s | 0.23s | 5 |
| meta/llama-3.1-70b-instruct | 51.18 t/s | 0.23s | 5 |
| moonshotai/kimi-k2-instruct-0905 | 47.28 t/s | 0.72s | 5 |
| 01-ai/yi-large | 43.74 t/s | 0.22s | 5 |
| google/gemma-2-27b-it | 43.69 t/s | 0.23s | 10 |
| deepseek-ai/deepseek-r1-distill-qwen-14b | 40.96 t/s | 0.49s | 5 |
| deepseek-ai/deepseek-v3.1 | 38.10 t/s | 2.50s | 25 |
| deepseek-ai/deepseek-v3.1 | 38.10 t/s | 2.50s | 25 |
| 时间 | 模型 | 速度 | 延迟 |
|---|---|---|---|
| Jan 13, 06:48 AM | z-ai/glm4.7 | 89.00 t/s | 18.23s |
| Jan 13, 02:34 AM | z-ai/glm4.7 | 106.60 t/s | 15.93s |
| Jan 12, 02:08 PM | z-ai/glm4.7 | 41.49 t/s | 33.23s |
| Jan 5, 03:40 PM | moonshotai/kimi-k2-thinking | 19.63 t/s | 36.88s |
| Dec 21, 01:18 PM | Unknown | - | -s |
| Dec 15, 02:06 PM | openai/gpt-oss-120b | 251.35 t/s | 0.80s |
| Dec 15, 12:12 PM | Unknown | - | -s |
| Dec 15, 03:57 AM | Unknown | - | -s |
| Nov 8, 12:52 PM | openai/gpt-oss-120b | 149.96 t/s | 18.38s |
| Nov 8, 12:45 PM | nvidia/llama-3.3-nemotron-super-49b-v1.5 | 79.68 t/s | 0.28s |