Hugging Face is a collaboration platform for machine learning, hosting over 1 million models, 250,000 datasets, and 400,000 applications. It supports various modalities including text, image, video, audio, and 3D. Key offerings include the Hugging Face Hub for public hosting, Inference Providers API for accessing 45,000+ models from leading AI providers, and Compute solutions like Inference Endpoints and Spaces with GPU support. Enterprise plans start at $20/user/month, and Compute pricing begins at $0.60/hour for GPU. Use cases include model discovery, application deployment, and team collaboration in AI development.

Model

Input ($/M)

Output ($/M)

Audit

Speed

Latency

Qwen/Qwen3.5-9B

—

109.0 t/s

7.99 s

meta-llama/Llama-3.3-70B-Instruct

—

416.2 t/s

0.26 s

moonshotai/Kimi-K2-Instruct:novita

—

48.8 t/s

1.21 s

Qwen/Qwen3-Coder-480B-A35B-Instruct:novita

—

63.6 t/s

1.06 s

openai/gpt-oss-120b:novita

—

240.3 t/s

1.26 s

openai/gpt-oss-20b:novita

—

155.7 t/s

3.56 s

zai-org/GLM-4.5:novita

—

35.4 t/s

1.39 s

Qwen/Qwen3-235B-A22B:novita

—

34.1 t/s

1.12 s

Time

Model

Speed

Latency

Apr 1, 06:16 AM

Qwen/Qwen3.5-9B

108.96 tok/s

7.99s

Jan 14, 02:25 AM

meta-llama/Llama-3.3-70B-Instruct

416.19 tok/s

0.26s

Aug 13, 03:29 PM

moonshotai/Kimi-K2-Instruct:novita

48.81 tok/s

1.21s

Aug 13, 03:26 PM

Qwen/Qwen3-Coder-480B-A35B-Instruct:novita

63.59 tok/s

1.06s

Aug 13, 02:45 PM

openai/gpt-oss-20b:novita

155.67 tok/s

3.56s

Aug 13, 02:41 PM

openai/gpt-oss-120b:novita

240.32 tok/s

1.26s

Aug 13, 02:35 PM

zai-org/GLM-4.5:novita

35.36 tok/s

1.39s

Aug 13, 03:46 AM

Qwen/Qwen3-235B-A22B:novita

34.12 tok/s

1.12s

Provider

Why compare

Models

Free

Avg price

Speed

30d uptime

Hugging Face

router-huggingface-co

Hugging Face Router provides intelligent model routing across Inference Providers, offering OpenAI-compatible API access to open-source models.

Current provider baseline

N/A

138 tok/s

0.5%

Kriora

api-kriora-com

Provides OpenAI-compatible APIs and managed GPU instances for deploying and scaling open-source AI models.

Faster measured speed
Higher 30-day availability
Broader model coverage
Same provider category

N/A

565 tok/s

99.6%

openrouter

A unified API interface providing access to over 300 models from 60+ providers, including OpenAI, Anthropic, and Google.

Higher 30-day availability
Broader model coverage
Same provider category

221

N/A

86 tok/s

99.7%

siliconflow

Provides cost-effective generative AI cloud services based on open-source models for text, image, video, and audio generation.

Higher 30-day availability
Broader model coverage
Same provider category

N/A

53 tok/s

79.9%

nvidia-nim

NVIDIA NIM provides optimized AI model inference APIs for LLMs, vision, and embedding models through NVIDIA cloud infrastructure.

Higher 30-day availability
Broader model coverage
Same provider category

N/A

66 tok/s

99.4%

qiniu-2

七牛云提供AI大模型推理服务，包括多种模型调用、智能问答助手、代码助手等企业级AI解决方案。

Higher 30-day availability
Broader model coverage
Same provider category

N/A

42 tok/s

99.4%

api-inference-modelscope-cn

ModelScope provides model inference API access to a wide range of open-source AI models via OpenAI-compatible endpoints.

Higher 30-day availability
Broader model coverage
Same provider category

N/A

42 tok/s

76.3%

Notes

Health checks: Scope: the 72-hour chart and recent availability measure API connectivity only. Each bar summarizes one hour of checks. Targets: LMSpeed tries the configured health check URL and provider status URL first, then API endpoints derived from known API hosts and recent speed-test base URLs. A website host is considered only when it looks like an API endpoint. Probe steps: each candidate goes through DNS lookup, TCP connection, TLS handshake for HTTPS, and an HTTP HEAD request with redirects followed. Probing stops after the first reachable candidate. Reachable criteria: every required network step must succeed. An HTTP response below 500 is treated as reachable, including 401 because it confirms that an authenticated API endpoint responded, except for statuses classified as blocked. Blocked results: HTTP 403, 429, 521, 525, and 530, plus detected WAF or Cloudflare challenges, are shown as blocked and excluded from availability calculations because LMSpeed cannot determine whether the API itself is down. Model availability: when a dedicated test key is configured, LMSpeed sends an authenticated GET request to a derived /models endpoint and compares returned model IDs with this provider's listed models. These per-model results appear in Models & Pricing and are not included in the provider connectivity percentage. Timeouts: TCP connection, TLS handshake, HTTP connectivity, and model requests each use a 20-second timeout. A full run can take longer when several candidates are tried. Frequency: a background worker checks all providers every 5 minutes by default. The 72-hour chart combines those samples into hourly bars, and the schedule may be changed by the service operator. Limit: automated samples are not an SLA and do not guarantee account quota, every model, every region, or successful completion requests. Check the provider's own status page before making operational decisions.

Domain Rating data is sourced from Ahrefs. It is a 0–100 backlink-based domain strength signal and does not measure API speed or reliability.

Announcements and FAQ are read from this provider's NewAPI status snapshot when available. LMSpeed stores the original content and optional English translations from the provider status source, then shows the localized fields on this page.

Hugging Face

Hugging Face

API Endpoints

Health Check

API Benchmarks & Pricing

Recent Test Records

Similar API Provider Alternatives to Compare

Notes

Similar API Provider Alternatives to Compare

Provider	Why compare	Models	Avg price	Speed	30d uptime
Hugging Face router-huggingface-co Hugging Face Router provides intelligent model routing across Inference Providers, offering OpenAI-compatible API access to open-source models.	Current provider baseline	7	N/A	138 tok/s	0.5%
Kriora api-kriora-com Provides OpenAI-compatible APIs and managed GPU instances for deploying and scaling open-source AI models.	Faster measured speed Higher 30-day availability Broader model coverage Same provider category	9	N/A	565 tok/s	99.6%
openrouter A unified API interface providing access to over 300 models from 60+ providers, including OpenAI, Anthropic, and Google.	Higher 30-day availability Broader model coverage Same provider category	221	N/A	86 tok/s	99.7%
siliconflow Provides cost-effective generative AI cloud services based on open-source models for text, image, video, and audio generation.	Higher 30-day availability Broader model coverage Same provider category	60	N/A	53 tok/s	79.9%
nvidia-nim NVIDIA NIM provides optimized AI model inference APIs for LLMs, vision, and embedding models through NVIDIA cloud infrastructure.	Higher 30-day availability Broader model coverage Same provider category	53	N/A	66 tok/s	99.4%
qiniu-2 七牛云提供AI大模型推理服务，包括多种模型调用、智能问答助手、代码助手等企业级AI解决方案。	Higher 30-day availability Broader model coverage Same provider category	40	N/A	42 tok/s	99.4%
api-inference-modelscope-cn ModelScope provides model inference API access to a wide range of open-source AI models via OpenAI-compatible endpoints.	Higher 30-day availability Broader model coverage Same provider category	35	N/A	42 tok/s	76.3%

Hugging Face

Hugging Face

API Endpoints

About Hugging Face

Health Check

API Benchmarks & Pricing

Recent Test Records

Similar API Provider Alternatives to Compare

Notes

Similar API Provider Alternatives to Compare