Cerebras offers cloud-based AI APIs focused on high-performance inference and training for large language models and other AI workloads. The platform utilizes Cerebras' custom hardware architecture, specifically the Wafer-Scale Engine (WSE), designed to accelerate compute-intensive AI tasks. Key capabilities include model serving, batch processing, and scalable training pipelines. Typical use cases involve enterprises and researchers running demanding AI applications that benefit from specialized hardware acceleration.

Model

Input ($/M)

Output ($/M)

Audit

Speed

Latency

zai-glm-4.7

—

400.0 t/s

3.57 s

Time

Model

Speed

Latency

Jan 13, 04:32 PM

zai-glm-4.7

400.04 tok/s

3.57s

Provider

Why compare

Models

Free

Avg price

Speed

30d uptime

Cerebras

api-cerebras-ai

Provides AI inference and training APIs leveraging Cerebras hardware for large-scale model deployment.

Current provider baseline

N/A

400 tok/s

99.6%

Kriora

api-kriora-com

Provides OpenAI-compatible APIs and managed GPU instances for deploying and scaling open-source AI models.

Faster measured speed
Higher 30-day availability
Broader model coverage
Same provider category

N/A

565 tok/s

99.7%

openrouter

A unified API interface providing access to over 300 models from 60+ providers, including OpenAI, Anthropic, and Google.

Higher 30-day availability
Broader model coverage
Same provider category

221

N/A

86 tok/s

99.7%

huawei-modelarts

Huawei ModelArts MaaS (Model as a Service) platform offering Pangu series models and third-party LLMs via OpenAI-compatible API.

Higher 30-day availability
Broader model coverage
Same provider category

N/A

30 tok/s

99.6%

api-suanli-cn

Gongji Compute (共绩算力) provides pay-as-you-go, standard API access to mainstream LLMs and elastic GPU computing for AI applications.

More free-model options
Broader model coverage
Same provider category

N/A

36 tok/s

77.8%

api-fireworks-ai

Fireworks AI provides a cloud platform for running and fine-tuning open-source AI models with optimized inference for production applications.

Higher 30-day availability
Broader model coverage
Same provider category

N/A

139 tok/s

99.8%

chutes

Chutes provides LLM inference API service, offering access to various open-source AI models through OpenAI-compatible endpoints.

Higher 30-day availability
Broader model coverage
Same provider category

N/A

27 tok/s

99.6%

Notes

Health checks: Scope: the 72-hour chart and recent availability measure API connectivity only. Each bar summarizes one hour of checks. Targets: LMSpeed tries the configured health check URL and provider status URL first, then API endpoints derived from known API hosts and recent speed-test base URLs. A website host is considered only when it looks like an API endpoint. Probe steps: each candidate goes through DNS lookup, TCP connection, TLS handshake for HTTPS, and an HTTP HEAD request with redirects followed. Probing stops after the first reachable candidate. Reachable criteria: every required network step must succeed. An HTTP response below 500 is treated as reachable, including 401 because it confirms that an authenticated API endpoint responded, except for statuses classified as blocked. Blocked results: HTTP 403, 429, 521, 525, and 530, plus detected WAF or Cloudflare challenges, are shown as blocked and excluded from availability calculations because LMSpeed cannot determine whether the API itself is down. Model availability: when a dedicated test key is configured, LMSpeed sends an authenticated GET request to a derived /models endpoint and compares returned model IDs with this provider's listed models. These per-model results appear in Models & Pricing and are not included in the provider connectivity percentage. Timeouts: TCP connection, TLS handshake, HTTP connectivity, and model requests each use a 20-second timeout. A full run can take longer when several candidates are tried. Frequency: a background worker checks all providers every 5 minutes by default. The 72-hour chart combines those samples into hourly bars, and the schedule may be changed by the service operator. Limit: automated samples are not an SLA and do not guarantee account quota, every model, every region, or successful completion requests. Check the provider's own status page before making operational decisions.

Domain Rating data is sourced from Ahrefs. It is a 0–100 backlink-based domain strength signal and does not measure API speed or reliability.

Announcements and FAQ are read from this provider's NewAPI status snapshot when available. LMSpeed stores the original content and optional English translations from the provider status source, then shows the localized fields on this page.

Cerebras

Cerebras

API Endpoints

Health Check

API Benchmarks & Pricing

Recent Test Records

Similar API Provider Alternatives to Compare

Notes

Similar API Provider Alternatives to Compare

Provider	Why compare	Models	Free	Avg price	Speed	30d uptime
Cerebras api-cerebras-ai Provides AI inference and training APIs leveraging Cerebras hardware for large-scale model deployment.	Current provider baseline	1	0	N/A	400 tok/s	99.6%
Kriora api-kriora-com Provides OpenAI-compatible APIs and managed GPU instances for deploying and scaling open-source AI models.	Faster measured speed Higher 30-day availability Broader model coverage Same provider category	9	0	N/A	565 tok/s	99.7%
openrouter A unified API interface providing access to over 300 models from 60+ providers, including OpenAI, Anthropic, and Google.	Higher 30-day availability Broader model coverage Same provider category	221	0	N/A	86 tok/s	99.7%
huawei-modelarts Huawei ModelArts MaaS (Model as a Service) platform offering Pangu series models and third-party LLMs via OpenAI-compatible API.	Higher 30-day availability Broader model coverage Same provider category	10	0	N/A	30 tok/s	99.6%
api-suanli-cn Gongji Compute (共绩算力) provides pay-as-you-go, standard API access to mainstream LLMs and elastic GPU computing for AI applications.	More free-model options Broader model coverage Same provider category	6	5	N/A	36 tok/s	77.8%
api-fireworks-ai Fireworks AI provides a cloud platform for running and fine-tuning open-source AI models with optimized inference for production applications.	Higher 30-day availability Broader model coverage Same provider category	5	0	N/A	139 tok/s	99.8%
chutes Chutes provides LLM inference API service, offering access to various open-source AI models through OpenAI-compatible endpoints.	Higher 30-day availability Broader model coverage Same provider category	2	0	N/A	27 tok/s	99.6%

Cerebras

Cerebras

API Endpoints

About Cerebras

Health Check

API Benchmarks & Pricing

Recent Test Records

Similar API Provider Alternatives to Compare

Notes

Similar API Provider Alternatives to Compare