GPT Load is an open-source intelligent load balancing platform designed to manage and distribute API requests across multiple AI providers. It provides a unified interface for accessing various AI models, helping developers optimize performance and reliability. Key features include load balancing, failover handling, and request routing. The platform supports integration with different AI APIs, allowing users to switch between providers seamlessly. Typical use cases include AI application development, API management, and ensuring high availability for AI services. The project is available on GitHub under the MIT license.

Model

Input ($/M)

Output ($/M)

Audit

Speed

Latency

llama-3.3-70b

—

984.9 t/s

0.45 s

llama-4-maverick-17b-128e-instruct

—

825.7 t/s

0.41 s

llama3.1-8b

—

1629.3 t/s

0.36 s

qwen-3-235b-a22b-thinking-2507

—

579.8 t/s

0.44 s

openai/gpt-oss-120b

668480100

481.9 t/s

0.43 s

Time

Model

Speed

Latency

Jun 3, 06:58 PM

openai/gpt-oss-120b

481.90 tok/s

0.43s

Dec 25, 02:06 PM

llama3.1-8b

2142.09 tok/s

0.19s

Dec 25, 02:02 PM

llama-3.3-70b

1374.74 tok/s

0.25s

Sep 21, 06:22 PM

llama3.1-8b

1834.10 tok/s

0.35s

Sep 21, 06:21 PM

llama-4-maverick-17b-128e-instruct

825.70 tok/s

0.41s

Sep 21, 06:19 PM

llama-3.3-70b

890.30 tok/s

0.51s

Sep 21, 06:18 PM

qwen-3-235b-a22b-thinking-2507

579.82 tok/s

0.44s

Sep 21, 06:17 PM

llama3.1-8b

2117.91 tok/s

0.34s

Jun 8, 11:00 PM

llama3.1-8b

806.21 tok/s

0.47s

Jun 8, 10:59 PM

llama-3.3-70b

794.97 tok/s

0.52s

Provider

Why compare

Models

Free

Avg price

Speed

30d uptime

GPT Load (Shiho)

gpt-load-shiho-top

GPT Load (Shiho) is an OpenAI-compatible API load balancing service hosted at gpt-load.shiho.top, distributing requests across multiple AI model providers for improved reliability.

Current provider baseline

N/A

1164 tok/s

99.5%

N1N

api-n1n-ai

N1N provides API access to a wide range of AI models including GPT-4, Claude 3, Gemini, and others for text, image, and video generation.

Higher 30-day availability
More free-model options
Broader model coverage

177

$8.77/M

90 tok/s

99.7%

api-kr777-top

CaMeL AI provides an OpenAI-compatible API gateway with extensive model coverage and pricing options.

More free-model options
Broader model coverage

194

$78.76/M

93 tok/s

99.2%

new-waadri-top

WAADRI runs a unified AI model gateway that exposes aggregated model access through OpenAI-, Claude-, and Gemini-compatible interfaces.

More free-model options
Broader model coverage

192

710

N/A

5.5%

180txt-cn

180txt API provides an OpenAI-compatible API relay for multiple AI models.

More free-model options
Broader model coverage

$31.07/M

42 tok/s

4.8%

meta-api

Provides API services with a model marketplace and developer tools.

Higher 30-day availability
Broader model coverage

$31.81/M

N/A

99.8%

apitoken-online

ApiToken Online offers an OpenAI-compatible API gateway at apitoken.online with transparent per-model pricing and multi-provider routing.

More free-model options
Broader model coverage

$21.19/M

81 tok/s

72.9%

Notes

Health checks: Scope: the 72-hour chart and recent availability measure API connectivity only. Each bar summarizes one hour of checks. Targets: LMSpeed tries the configured health check URL and provider status URL first, then API endpoints derived from known API hosts and recent speed-test base URLs. A website host is considered only when it looks like an API endpoint. Probe steps: each candidate goes through DNS lookup, TCP connection, TLS handshake for HTTPS, and an HTTP HEAD request with redirects followed. Probing stops after the first reachable candidate. Reachable criteria: every required network step must succeed. An HTTP response below 500 is treated as reachable, including 401 because it confirms that an authenticated API endpoint responded, except for statuses classified as blocked. Blocked results: HTTP 403, 429, 521, 525, and 530, plus detected WAF or Cloudflare challenges, are shown as blocked and excluded from availability calculations because LMSpeed cannot determine whether the API itself is down. Model availability: when a dedicated test key is configured, LMSpeed sends an authenticated GET request to a derived /models endpoint and compares returned model IDs with this provider's listed models. These per-model results appear in Models & Pricing and are not included in the provider connectivity percentage. Timeouts: TCP connection, TLS handshake, HTTP connectivity, and model requests each use a 20-second timeout. A full run can take longer when several candidates are tried. Frequency: a background worker checks all providers every 5 minutes by default. The 72-hour chart combines those samples into hourly bars, and the schedule may be changed by the service operator. Limit: automated samples are not an SLA and do not guarantee account quota, every model, every region, or successful completion requests. Check the provider's own status page before making operational decisions.

Domain Rating data is sourced from Ahrefs. It is a 0–100 backlink-based domain strength signal and does not measure API speed or reliability.

Announcements and FAQ are read from this provider's NewAPI status snapshot when available. LMSpeed stores the original content and optional English translations from the provider status source, then shows the localized fields on this page.

GPT Load (Shiho)

GPT Load (Shiho)

API Endpoints

Health Check

API Benchmarks & Pricing

Recent Test Records

Similar API Provider Alternatives to Compare

Notes

Similar API Provider Alternatives to Compare

Provider	Why compare	Models	Free	Avg price	Speed	30d uptime
GPT Load (Shiho) gpt-load-shiho-top GPT Load (Shiho) is an OpenAI-compatible API load balancing service hosted at gpt-load.shiho.top, distributing requests across multiple AI model providers for improved reliability.	Current provider baseline	10	0	N/A	1164 tok/s	99.5%
N1N api-n1n-ai N1N provides API access to a wide range of AI models including GPT-4, Claude 3, Gemini, and others for text, image, and video generation.	Higher 30-day availability More free-model options Broader model coverage	177	6	$8.77/M	90 tok/s	99.7%
api-kr777-top CaMeL AI provides an OpenAI-compatible API gateway with extensive model coverage and pricing options.	More free-model options Broader model coverage	194	6	$78.76/M	93 tok/s	99.2%
new-waadri-top WAADRI runs a unified AI model gateway that exposes aggregated model access through OpenAI-, Claude-, and Gemini-compatible interfaces.	More free-model options Broader model coverage	192	710	N/A	N/A	5.5%
180txt-cn 180txt API provides an OpenAI-compatible API relay for multiple AI models.	More free-model options Broader model coverage	37	8	$31.07/M	42 tok/s	4.8%
meta-api Provides API services with a model marketplace and developer tools.	Higher 30-day availability Broader model coverage	34	0	$31.81/M	N/A	99.8%
apitoken-online ApiToken Online offers an OpenAI-compatible API gateway at apitoken.online with transparent per-model pricing and multi-provider routing.	More free-model options Broader model coverage	33	11	$21.19/M	81 tok/s	72.9%

GPT Load (Shiho)

GPT Load (Shiho)

API Endpoints

About GPT Load (Shiho)

Health Check

API Benchmarks & Pricing

Recent Test Records

Similar API Provider Alternatives to Compare

Notes

Similar API Provider Alternatives to Compare