llama-4-scout-16e-instruct
Developer: Meta
Also known as
Llama 4 Scout 16e Instruct by Meta is available through 21 API providers on LMSpeed. Compare API pricing from $0.010 to $535.71 per million input tokens across providers. In speed benchmarks, the fastest provider reaches 603 tok/s.
Compare speed and latency performance across all API providers.
| Provider | Speed | Latency | Tests |
|---|---|---|---|
Fanyi 963312 llama-4-scout-17b-16e-instruct | 602.69 tok/s | 0.60s | 10 |
ChatST API meta-llama/llama-4-scout-17b-16e-instruct | 444.26 tok/s | 0.63s | 5 |
meta-llama/Llama-4-Scout-17B-16E-Instruct | 118.72 tok/s | 1.04s | 5 |
Showing 1-3 of 3 providers
gpt-oss
GPT-OSS is an open-source language model offering advanced reasoning, code generation, and multimodal capabilities.
deepseek-v3-2
DeepSeek V3.2 is a large language model in the DeepSeek V3 series, offering advanced reasoning, code generation, and multimodal capabilities.
kimi-k2-5
Moonshot Kimi K2.5 is a large language model in the Kimi series, offering advanced reasoning, code generation, and multimodal capabilities.
minimax-m2-5
MiniMax M2.5 is MiniMax's flagship text model for coding and agents, with SOTA-level programming and agentic performance, improved token efficiency, and fast high-TPS API deployment.
glm-5-1
Zhipu GLM-5.1 is a next-generation GLM model aimed at frontier reasoning, coding, and bilingual agent applications.
minimax-m2-7
MiniMax M2.7 is a large language model in the MiniMax series, offering advanced reasoning, code generation, and multimodal capabilities.
Rankings are based on community-submitted tests and periodic health probes. Advisory only, not official data.