Browse canonical models across providers with performance and coverage highlights.
Qwen-72B is a 72-billion-parameter language model in Alibaba Cloud's Qwen (Tongyi Qianwen) family, designed for multilingual instruction following, reasoning, and general-purpose text generation.
Input price
From $2.64/M
Avg speed
—
First token
—
Providers
15
Alibaba Qwen3 Instruct is an instruction-tuned variant in the Qwen series, optimized for following instructions and conversational tasks.
Input price
From $0.010/M
Avg speed
—
First token
—
Providers
7
DeepSeek V4 Pro is a large language model in the DeepSeek series, offering advanced reasoning, code generation, and multimodal capabilities.
Input price+9 free
From $0.0007/M
Avg speed
43 t/s
First token
8.51s
Providers
177
DeepSeek V4 Flash is a fast, cost-efficient language model in the DeepSeek V4 family, optimized for low-latency chat, coding assistance, and high-throughput API workloads while retaining strong reasoning quality.
Input price+9 free
From $0.0007/M
Avg speed
71 t/s
First token
5.89s
Providers
181
Xiaomi MiMo-V2.5-TTS-VoiceDesign is the voice-design variant of MiMo-V2.5-TTS on the Xiaomi MiMo API platform, enabling custom voice creation through stylistic prompts. Pricing: free during the limited-time launch period (0x token consumption).
Input price+1 free
From $0.0071/M
Avg speed
—
First token
—
Providers
38
Xiaomi MiMo-V2.5-TTS-VoiceClone is the voice-cloning variant of MiMo-V2.5-TTS on the Xiaomi MiMo API platform, enabling speech synthesis with cloned target voices. Pricing: free during the limited-time launch period (0x token consumption).
Input price+1 free
From $0.0071/M
Avg speed
—
First token
—
Providers
41
Xiaomi MiMo-V2.5-TTS is the text-to-speech model in the V2.5 series on the Xiaomi MiMo API platform, providing high-quality speech synthesis. Pricing: free during the limited-time launch period (0x token consumption).
Input price+1 free
From $0.0071/M
Avg speed
—
First token
—
Providers
41
Xiaomi MiMo-V2.5 is a native omnimodal sparse MoE model (310B total, 15B active) with unified text, image, video, and audio understanding, built on the MiMo-V2-Flash backbone with dedicated vision and audio encoders. It supports up to 1M tokens of context, strong agentic workflows, and open weights on Hugging Face.
Input price+1 free
From $0.0096/M
Avg speed
85 t/s
First token
3.14s
Providers
88
Alibaba Qwen3 Instant is a fast and efficient language model in the Qwen series, optimized for quick responses and high throughput.
Input price
From $0.010/M
Avg speed
—
First token
—
Providers
3
Xiaomi MiMo-V2.5-Pro is a large open-source language model in the MiMo series, offering advanced reasoning and general-purpose capabilities.
Input price+1 free
From $0.0000/M
Avg speed
48 t/s
First token
5.35s
Providers
92
MiniMax M2 Her is a language model in the MiniMax series, offering general-purpose reasoning, dialogue, and text generation capabilities.
Input price
From $1.03/M
Avg speed
—
First token
—
Providers
4
Xiaomi MiMo-V2-TTS is a text-to-speech model in the MiMo series, optimized for natural speech synthesis and voice generation tasks.
Input price+4 free
From $0.0071/M
Avg speed
—
First token
—
Providers
38
Anthropic Claude Opus 4.7 Max is a high-capability language model in the Claude series, offering enhanced reasoning, code generation, and multimodal capabilities.
Input price
From $0.0014/M
Avg speed
—
First token
—
Providers
24
Anthropic Claude Opus 4.7 targets frontier-level analysis, complex coding, and autonomous workflows that require deep multi-step reasoning.
Input price+1 free
From $0.0014/M
Avg speed
43 t/s
First token
3.86s
Providers
174
DeepSeek V4 is a large language model in the DeepSeek series, offering advanced reasoning, code generation, and multimodal capabilities.
Input price
From $0.050/M
Avg speed
47 t/s
First token
2.49s
Providers
12
Alibaba Qwen1.8B Long Context is a compact language model in the Qwen series, optimized for low-latency responses and efficient inference.
Input price
From $0.050/M
Avg speed
—
First token
—
Providers
16
Alibaba Qwen1.8B is a compact language model in the Qwen series, optimized for low-latency responses and efficient inference.
Input price
From $0.050/M
Avg speed
—
First token
—
Providers
16
Aion RP Llama 3.1 is a roleplay-tuned variant in the Aion series, optimized for character-driven dialogue and creative writing.
Input price
From $0.401/M
Avg speed
—
First token
—
Providers
6
Nous Research Hermes 3 is a generalist instruct model fine-tuned on Meta Llama 3.1, with strong reasoning, roleplay, multi-turn chat, tool calling, and structured JSON output.
Input price+1 free
From $0.0050/M
Avg speed
—
First token
—
Providers
24
LFM 2.5 1.2B Instruct is a compact language model in the LFM series, optimized for low-latency responses and efficient inference.
Input price+2 free
From $0.0010/M
Avg speed
—
First token
—
Providers
20
Meta Llama 3.1 extends the Llama 3 family with stronger reasoning, tool use, and long-context support across 8B to 405B scales.
Input price
From $0.0007/M
Avg speed
—
First token
—
Providers
27
Meta Llama 3.3 is an updated Llama 3 open model with improved instruction following, multilingual support, and efficient inference.
Input price
From $0.100/M
Avg speed
985 t/s
First token
0.45s
Providers
14
Meta Llama 3.3 Instruct is an instruction-tuned variant in the Llama series, optimized for following instructions and conversational tasks.
Input price
From $1.00/M
Avg speed
48 t/s
First token
0.93s
Providers
5
Dracarys Llama 3.1 Instruct is an instruction-tuned variant, optimized for following instructions and conversational tasks.
Input price+3 free
From $0.010/M
Avg speed
18 t/s
First token
0.59s
Providers
25