Groq provides fast and low-cost AI inference through its LPU architecture and GroqCloud platform.

| Model | Speed | Latency | Tests |
|---|---|---|---|
| openai/gpt-oss-20b | 755.20 t/s | 0.47s | 5 |
| openai/gpt-oss-120b | 456.69 t/s | 0.31s | 10 |
| qwen/qwen3-32b | 310.21 t/s | 0.18s | 5 |
| glm-4.5-air | 76.86 t/s | 8.82s | 5 |
| glm-4.5-air | 76.86 t/s | 8.82s | 5 |
| free:Qwen3-30B-A3B | 22.38 t/s | 10.19s | 20 |
| Time | Model | Speed | Latency |
|---|---|---|---|
| Dec 16, 11:25 AM | Unknown | - | -s |
| Dec 16, 11:21 AM | Unknown | - | -s |
| Dec 16, 11:17 AM | openai/gpt-oss-120b | 446.44 t/s | 0.34s |
| Dec 12, 12:40 AM | qwen/qwen3-32b | 310.21 t/s | 0.18s |
| Dec 12, 12:39 AM | openai/gpt-oss-20b | 755.20 t/s | 0.47s |
| Dec 12, 12:38 AM | openai/gpt-oss-120b | 466.94 t/s | 0.28s |
| Dec 8, 06:30 AM | free:Qwen3-30B-A3B | 18.89 t/s | 12.00s |
| Dec 8, 06:24 AM | free:Qwen3-30B-A3B | 23.01 t/s | 7.66s |
| Dec 8, 06:24 AM | free:Qwen3-30B-A3B | 19.23 t/s | 4.74s |
| Dec 8, 06:22 AM | glm-4.5-air | 76.86 t/s | 8.82s |