LogoLMSpeed
  • Home
  • Free
  • Models
  • Providers
  • Docs
LogoLMSpeed

API Pricing · Speed · Security

Compare pricing, speed, and security in one shareable test report.

  • Pricing/rates / free tiers / coverage
  • Speed/latency / throughput / duration
  • Security/model / prompt / leakage

Pricing, speed, and API security in one workflow

Put per-token pricing, five-round benchmarks, health checks, and safety probes into one workflow before an API reaches your app.

API Pricing Comparison

Compare per-token pricing across 100+ providers, find cheaper APIs for each model, and track free tiers and credits.

Real-time Speed Benchmarks

Run a five-round benchmark with standardized prompts to measure first-token latency, output throughput, and response time.

API Security Audit

Audit any OpenAI-compatible API for model authenticity, hidden prompts, instruction tampering, stream integrity, and error leakage, then share a plain-language report.

Custom Endpoint Benchmarks

Enter a base URL, API key, and model ID to test official providers, proxies, relays, or self-hosted endpoints.

Speed Benchmark Analytics

Review first-token latency, output throughput, total duration, health, and recent probe signals to judge stability.

Real-time Streaming Results

Watch streaming output, progress, and summary results for every prompt so the benchmark is easy to review and share.

Frequently Asked Questions

Learn more about LMSpeed

How do I compare LLM API pricing across providers?

LMSpeed aggregates per-token pricing from 100+ API providers. Visit any model page to see a side-by-side pricing comparison table showing input and output rates per million tokens, so you can find the cheapest provider for each model.

Which LLM APIs are free?

Many providers offer free API tiers or credits for popular models like DeepSeek, Gemini, and Llama. Check our Free LLM API directory for a complete list of models with free access, including speed benchmarks for each free provider.

How does LMSpeed conduct speed benchmark testing?

LMSpeed employs a five-round continuous stress testing mechanism with standardized prompts. Token calculations are performed accurately using tiktoken, measuring output throughput (tokens per second) and first-token latency.

What does the API trust audit check?

It sends multiple safety probes to an OpenAI-compatible endpoint to check whether the model identity matches, hidden system prompts are injected, user instructions are rewritten, streaming responses stay intact, and errors leak sensitive implementation details. The result includes a risk score and a shareable report. API keys are only used for that audit and are not written to public reports.

How to compare speed between different API providers?

Use our performance leaderboards and model detail pages to visually compare API speed benchmarks across providers. The system ranks providers by throughput, latency, and health, helping you choose the fastest and most reliable API.

Is long-term performance monitoring supported?

Provider pages and the health leaderboard already show recent health checks, probe latency, success or failure status, and stability rankings. Broader continuous monitoring and alerting will keep expanding.

LogoLMSpeed

The best API speed test tool

GitHubGitHubTwitterX (Twitter)Email
Product
  • Features
  • Pricing
  • FAQ
Leaderboard
  • Overview
  • Speed Ranking
  • Latency Ranking
  • Health Ranking
  • Input Price
  • Output Price
  • Reasoning
  • Coding
Models
  • All Models
  • GPT
  • Claude
  • Gemini
  • DeepSeek
  • Llama
  • Qwen
Free Models
  • All Free Models
  • Free GPT
  • Free Claude
  • Free Gemini
  • Free DeepSeek
  • Free Llama
  • Free Qwen
Resources
  • Speed Test
  • Provider Directory
  • Documentation
Legal
  • Cookie Policy
  • Privacy Policy
  • Terms of Service
© 2026 LMSpeed All Rights Reserved.Made by Nexmoe with ❤️

Recent API Security Audits

Fresh relay audit reports with four health scores for model identity, prompt safety, endpoint profile, and response integrity.

Recent audit records

ProviderModelAuditedChecks
Codex APIgpt-5.5Jun 51006886100
猫羽霖APIqwen3.7-maxJun 5768486100
wcnbai.comClaude Opus 4.7Jun 58810086100
wcnbai.comClaude Opus 4.7Jun 5000100
internal-api.pvzflare.comgpt-5.5Jun 5000100
api.xpzhao.topClaude Opus 4.7Jun 5000100
api.xpzhao.topClaude Opus 4.6Jun 576100100100
算了么 APIfree:QwQ-32BJun 500088
daodunapi.comclaude-opus-4-8Jun 500088
ai-pixel.onlinegpt-5.5Jun 5000100
Browse audit reports

New LLM Models

The latest canonical models on LMSpeed — with pricing, provider coverage, and live speed benchmarks.

QwenQwen-72B

Qwen-72B is a 72-billion-parameter language model in Alibaba Cloud's Qwen (Tongyi Qianwen) family, designed for multilingual instruction following, reasoning, and general-purpose text generation.

Input price

From $2.64/M

Avg speed

—

First token

—

Providers

15

+5 more

QwenQwen3 Instruct

Alibaba Qwen3 Instruct is an instruction-tuned variant in the Qwen series, optimized for following instructions and conversational tasks.

Input price

From $0.010/M

Avg speed

—

First token

—

Providers

7

DeepSeekDeepSeek V4 Pro

DeepSeek V4 Pro is a large language model in the DeepSeek series, offering advanced reasoning, code generation, and multimodal capabilities.

Input price+10 free

From $0.0007/M

Avg speed

43 t/s

First token

8.53s

Providers

188

+141 more
Jun 5

DeepSeekDeepSeek V4 Flash

DeepSeek V4 Flash is a fast, cost-efficient language model in the DeepSeek V4 family, optimized for low-latency chat, coding assistance, and high-throughput API workloads while retaining strong reasoning quality.

Input price+10 free

From $0.0007/M

Avg speed

71 t/s

First token

5.87s

Providers

192

+144 more
Jun 4

MiMo-V2.5-TTS-VoiceDesign

Xiaomi MiMo-V2.5-TTS-VoiceDesign is the voice-design variant of MiMo-V2.5-TTS on the Xiaomi MiMo API platform, enabling custom voice creation through stylistic prompts. Pricing: free during the limited-time launch period (0x token consumption).

Input price+1 free

From $0.0071/M

Avg speed

—

First token

—

Providers

39

+20 more

MiMo-V2.5-TTS-VoiceClone

Xiaomi MiMo-V2.5-TTS-VoiceClone is the voice-cloning variant of MiMo-V2.5-TTS on the Xiaomi MiMo API platform, enabling speech synthesis with cloned target voices. Pricing: free during the limited-time launch period (0x token consumption).

Input price+1 free

From $0.0071/M

Avg speed

—

First token

—

Providers

42

+23 more

MiMo-V2.5-TTS

Xiaomi MiMo-V2.5-TTS is the text-to-speech model in the V2.5 series on the Xiaomi MiMo API platform, providing high-quality speech synthesis. Pricing: free during the limited-time launch period (0x token consumption).

Input price+1 free

From $0.0071/M

Avg speed

—

First token

—

Providers

42

+23 more

MiMo-V2.5

Xiaomi MiMo-V2.5 is a native omnimodal sparse MoE model (310B total, 15B active) with unified text, image, video, and audio understanding, built on the MiMo-V2-Flash backbone with dedicated vision and audio encoders. It supports up to 1M tokens of context, strong agentic workflows, and open weights on Hugging Face.

Input price+1 free

From $0.0096/M

Avg speed

85 t/s

First token

3.14s

Providers

92

+67 more
Jun 2

QwenQwen3 Instant

Alibaba Qwen3 Instant is a fast and efficient language model in the Qwen series, optimized for quick responses and high throughput.

Input price

From $0.010/M

Avg speed

—

First token

—

Providers

3

MiMo-V2.5-Pro

Xiaomi MiMo-V2.5-Pro is a large open-source language model in the MiMo series, offering advanced reasoning and general-purpose capabilities.

Input price+1 free

From $0.0000/M

Avg speed

48 t/s

First token

5.35s

Providers

96

+72 more
Jun 2

MiniMax M2 Her

MiniMax M2 Her is a language model in the MiniMax series, offering general-purpose reasoning, dialogue, and text generation capabilities.

Input price

From $1.03/M

Avg speed

—

First token

—

Providers

4

MiMo-V2-TTS

Xiaomi MiMo-V2-TTS is a text-to-speech model in the MiMo series, optimized for natural speech synthesis and voice generation tasks.

Input price+4 free

From $0.0071/M

Avg speed

—

First token

—

Providers

39

+21 more
Browse the full model directory

Compare LLM API Providers

Compare API pricing, speed benchmarks, and performance data across providers.

PackyAPI

PackyAPI (codex-api.packycode.com) is an OpenAI-compatible API relay for Codex and other models via a single interface.

Health

100%

Tests

15

Last check

Jun 5

API price

No health checks yet

6345ywz API

6345ywz API is an OpenAI-compatible API relay providing access to multiple AI models with competitive pricing.

Health

100%

Tests

260

Last check

Jun 5

API price

No health checks yet

Leonhard API

Codexe API provides an OpenAI-compatible API gateway at codexe.top for accessing multiple LLM models.

Health

100%

Tests

5

Last check

Jun 5

API price

No health checks yet

钠 API

Na API (naapi.cc) is an OpenAI-compatible LLM API gateway with competitive pricing and stable access to 100+ models from OpenAI, Anthropic, Google, and more.

Health

100%

Tests

206

Last check

Jun 5

API price

No health checks yet

ChooseC API

统一的 AI 模型聚合网关,一站接入 Claude、GPT、Qwen、DeepSeek、Kimi、GLM 等 260+ 主流大模型,全面兼容 OpenAI / Claude / Gemini 接口。

Health

99%

Tests

105

Last check

Jun 5

API price

No health checks yet

VSLLM

VSLLM runs a New API-powered AI gateway on vsllm.com for aggregated model access through a single endpoint.

Health

100%

Tests

135

Last check

Jun 5

API price

No health checks yet

YUNWU API

A unified API gateway providing access to multiple large language models with direct connectivity in China.

🇨🇳CountryChinaRelay

Health

100%

Tests

130

Last check

Jun 5

API price

No health checks yet

MiniMax

MiniMax provides multimodal AI models and APIs for text, speech (T2A/A2T), video, and music generation.

Official

Health

100%

Tests

200

Last check

Jun 5

API price

No health checks yet

阿里云百炼 DashScope

Alibaba Cloud DashScope provides AI model APIs including Qwen LLMs, vision, audio, and embedding models.

Inference

Health

0%

Tests

922

Last check

Jun 5

API price

No health checks yet

LiteRouter

A unified API gateway providing access to multiple AI models and LLMs from various providers through a single interface.

🇨🇳CountryChinaRelay

Health

100%

Tests

10

Last check

Jun 5

API price

No health checks yet

DeepSeek

DeepSeek provides API access to its latest large language models for text generation and coding tasks.

Official

Health

100%

Tests

407

Last check

Jun 5

API price

No health checks yet

NVIDIA NIM

NVIDIA NIM provides optimized AI model inference APIs for LLMs, vision, and embedding models through NVIDIA cloud infrastructure.

Inference

Health

100%

Tests

1,044

Last check

Jun 5

API price

No health checks yet

Compare all API providers