Groq provides fast and low-cost AI inference through its LPU architecture and GroqCloud platform.
Groq offers 19 LLM API models.
Speed benchmark average: 20 tok/s.

api.groq.comRankings are based on community-submitted tests and periodic health probes. Advisory only, not official data.
| Model | Speed | Latency | Tests |
|---|---|---|---|
meta-llama/llama-4-scout-17b-16e-instruct | 447.76 tok/s | 0.20s | 5 |
19.74 tok/s | 25.30s | 6532 | |
| Time | Model | Speed | Latency |
|---|---|---|---|
| Apr 25, 11:32 PM | meta-llama/llama-4-scout-17b-16e-instruct | 447.76 tok/s | 0.20s |
| Dec 23, 03:21 PM | free:Qwen3-30B-A3B | 23.23 tok/s | 18.61s |
| Dec 20, 02:59 AM | free:Qwen3-30B-A3B | 24.49 tok/s | 13.14s |
| Dec 18, 08:29 AM | free:Qwen3-30B-A3B | 30.07 tok/s | 9.45s |
| Dec 18, 04:35 AM | free:Qwen3-30B-A3B | 22.38 tok/s | 27.29s |
| Dec 17, 09:28 AM | free:Qwen3-30B-A3B | 24.97 tok/s | 15.03s |
| Dec 16, 12:22 PM | free:Qwen3-30B-A3B | 21.75 tok/s | 4.61s |
| Dec 16, 07:52 AM | free:Qwen3-30B-A3B | 19.47 tok/s | 23.58s |
| Dec 14, 04:14 PM | free:Qwen3-30B-A3B | 29.29 tok/s | 5.65s |
| Dec 13, 03:57 PM | free:Qwen3-30B-A3B | 21.04 tok/s | 1.45s |
52.99 tok/s |
8.89s |
| 20 |
newapi.ixio.cc
IXIOCCAPI is a unified API gateway for large language models, providing standardized endpoints for accessing multiple AI model providers.
cto.ntbsd.eu.org
APDSM runs a New API-based gateway on cto.ntbsd.eu.org for unified access to multiple AI models through a single endpoint.
91vip.futureppo.top
A non-profit API service providing access to various AI models including Codex, Claude Code, and Open Code, with specific unlimited-use groups.
chat-api4.087654.xyz
天絮 API provides an AI model relay service with multiple access points and stable connectivity.
napi.seaya.link
Seamee API provides an AI model relay for accessing multiple LLMs through OpenAI-compatible endpoints.
new-api.tommylam.me
TommyLam API offers an OpenAI-compatible API gateway for accessing multiple AI models at competitive rates.