LogoLMSpeed
  • Home
  • Free
  • Models
  • Providers
  • Docs
LogoLMSpeed
LogoLMSpeed

The best API speed test tool

GitHubGitHubTwitterX (Twitter)Email
Product
  • Features
  • Pricing
  • FAQ
Leaderboard
  • Overview
  • Speed Ranking
  • Latency Ranking
  • Health Ranking
  • Input Price
  • Output Price
  • Reasoning
  • Coding
Models
  • All Models
  • GPT
  • Claude
  • Gemini
  • DeepSeek
  • Llama
  • Qwen
Free Models
  • All Free Models
  • Free GPT
  • Free Claude
  • Free Gemini
  • Free DeepSeek
  • Free Llama
  • Free Qwen
Resources
  • Speed Test
  • Provider Directory
  • Documentation
Legal
  • Cookie Policy
  • Privacy Policy
  • Terms of Service
© 2026 LMSpeed All Rights Reserved.Made by Nexmoe with ❤️
A

KKSJ-AI

Relay

An AI model aggregation API platform offering multi-model access with OpenAI-compatible interfaces at low cost.

GeminiGemini 2.0 Pro

KKSJ-AI offers 245 LLM API models.

API pricing per token ranges from $0.0000 to $9.25/M (input).

Speed benchmark average: 49 tok/s.

KKSJ-AI interface preview
OverviewPerformance1Pricing245HealthEmbed
Avg Speed49.44 tok/s
Latency1.77 s
Updated4/27/2026
Created At8/13/2025
Recharge Rate¥1.00 per $1 quota

Features

DrawingTaskData Export

Login Methods

API Endpoints

  • US Route
    https://api.kksj.org/

    Direct connection to main site

  • CF CDN Route
    https://cnapi.kksj.org

    Backup if main site is unreachable

  • CF CDN Route
    https://api.kksj.work

    Backup if main site is unreachable

  • Endpoint 4Historical / Unverified
    https://api.kksj.org

About KKSJ-AI

Health Check

100%Recent availability
History (72 pts)
PastNow

API Speed Benchmarks

ModelSpeedLatencyTests
Geminigemini-2.0-pro-exp-02-05
49.44 tok/s
1.77s
5

API Pricing per Token

View all 245 prices
ModelAuditInput ($/M)Output ($/M)
mj_custom_zoom
—FreeFree
mj_inpaint
—FreeFree
cogvideox-result
—$0.0000$0.0000
luma-vip-task
—$0.0000$0.0000
luma-task
—$0.0000$0.0000
runway-video-task
—$0.0000$0.0000
vidu-task-get
—$0.0000$0.0000
runway-vip-video-task
—$0.0000$0.0000
fileupload
—$0.0001$0.0001
Grok
—$0.0014$0.0014

Recent Test Records

TimeModelSpeedLatency
Mar 18, 10:33 AM
Geminigemini-2.0-pro-exp-02-05
49.44 tok/s
1.77s

Announcements

default4/26/2026

Due to recent risk control measures, the cost of the codex special offer group has increased significantly. To ensure group stability, the price for this group will be adjusted to 0.6 yuan per cut, effective from 12:00 on 2026.05.2.

default4/24/2026

Added a codex trusted access group. All accounts in this group have passed OpenAI trusted access and come with CTF-related system prompts. It also enables fast mode by default to facilitate quick requests, allowing for some cybersecurity content. Trusted access supports legal and ethical cybersecurity work, including discovering and patching vulnerabilities, defensive attack chain simulation, and vulnerability research. Use is limited to systems you own or are explicitly authorized to assess.

default4/23/2026

gpt-5.5 launched on the codex channel.

default3/24/2026

To maintain stability of the claudemax group, the group multiplier is expected to be adjusted from 1.4 to 1.5.

default3/8/2026

New limited-time special group for codex, pure codex channel, official native experience, limited-time special price of 0.2 yuan per unit, official special price already supports 5.4.

default9/18/2025

Added gpt-5 related models.

default9/18/2025

Added claude-opus-4.1 related models.

default9/18/2025

Gemini official update, gemini-2.5 related models. Except for the official version, all other versions have been delisted by the official. Please update soon to avoid affecting usage.

default9/18/2025

For gemini related models, if you need to output the reasoning process, use models with the -thinking suffix. By default, the reasoning process is not output, but gemini will still reason, making the first token very slow!!! Some models can disable reasoning; use the -nothinking suffix. 2.5flashlite has reasoning disabled by default.

default9/18/2025

Most models are at official rates. Some models have higher costs, up to double the official rate. All prices are clearly marked.

default9/18/2025

For detailed information on each group, please go to the token settings page and click on token groups to view.

default9/18/2025

For easier calculation, the recharge price on this site has been changed to 1 yuan per unit, but the usage price remains unchanged. Group multipliers have been adjusted accordingly; the default group is 0.9, still 0.9 yuan per unit.

Alipay online recharge is now supported; 1 yuan recharges 1 credit, with the default group multiplier at 0.9!!!

default9/18/2025

This site's domain https://api.kksj.me will expire in December 2024. Please switch to https://api.kksj.org as soon as possible. This domain is a long-term domain and will remain valid indefinitely. Please save it promptly.

default9/18/2025

Domestic optimized domain with dynamic DNS resolution and CDN acceleration https://cnapi.kksj.org Recommended for use when the official domain is unavailable.

FAQ

What is a relay API? Why use a relay API?

Our relay API platform acts as an intelligent distribution system between users and major model service providers. Although you still call the model service providers' APIs, we handle user authentication, token distribution, billing management, and data queries through a unified interface. The core value includes: efficient access to over 1000 mainstream models with a single interface; extreme cost-effectiveness through our platform; and a shared compute pool based on our large user base, giving you access to millions of API call frequencies. In short, we lower technical and cost barriers, allowing every user to easily, safely, and efficiently use powerful AI model services.

Why does the relay API offer discounts compared to purchasing directly?

We serve global users by aggregating API call volumes from all users on our platform to negotiate exclusive discounts with model suppliers. With our collective purchasing power, we ensure the lowest prices that individual or enterprise users cannot obtain directly, passing these savings directly to each user.

Is there any difference in quality or speed compared to calling the official API directly?

There is no difference in quality or speed. We essentially call the official API, and with our global acceleration network, we significantly improve API request response speeds, ensuring it's more efficient and faster than calling independently.

A very small number of models give incorrect answers when asked about their version, knowledge base training cutoff, etc.

First, it's important to clarify that answers from the web version (reverse-engineered) and the API version may differ. The web version typically includes built-in prompts to ensure accuracy and match the knowledge base, while the API version, after training, usually only updates training data without specific adaptations. Therefore, you can compare the performance of the web version and official API version at any time to verify that our answers are completely consistent with the official ones. This comparison method can also help identify irregular merchants who use reverse-engineered methods to replace official APIs.

How can I determine if it is a GPT-4 model?

You can use the following logical question to test: Question: What is the relationship between Lu Xun and Zhou Shuren? GPT-3.5: Lu Xun and Zhou Shuren are two different people. GPT-4: Lu Xun and Zhou Shuren are the same person.

How are fees calculated, and are there discounts compared to the official pricing?

For example, text-based models are generally calculated by tokens, with 1K tokens costing xx USD, which is the same billing method as official. Our discount: purchasing USD through official channels requires real-time exchange rates (approximately 1:7), while on our platform, you can buy USD at a discounted rate (check the recharge page for specific exchange rates).

Why does it return upstream load or this error is not common?

In most cases, it's because the content format you sent is unsupported or incorrect. You can contact customer service for assistance.

Why is the completion response sometimes empty?

In most cases, it's because the text content you sent contains filtered prohibited words, so it was intercepted.

Does it support high-concurrency scenarios like immersive translation?

Yes, it supports.

Why does vision sometimes fail to recognize images?

It's related to image format, MIME type, etc. Generally, images that can be directly downloaded are fine.

Invalid token

First, confirm that the API endpoint is correctly filled. If it still doesn't work, please generate a new token.

How to contact us?

The only official customer service QQ (others are scammers, please do not be deceived): 1244119140

What is the quota? How is it calculated?

The quota calculation formula is as follows: Quota = Group Multiplier * Model Multiplier * (Prompt Tokens + Completion Tokens * Completion Multiplier). Completion Multiplier explanation: GPT3.5: fixed at 1.33; GPT4: fixed at 2 (consistent with official). Note: In non-stream mode, the official API returns total token consumption, but prompt and completion have different multipliers.

Why does it indicate insufficient quota when the account quota is sufficient?

This is because token quota and account quota are separate: Token quota is only used to set maximum usage limits; users can freely set token quota; please check if your token quota is sufficient.

Why does the log show an extra billing entry after using your built-in chat software?

This is normal. The chat software has an automatic title summarization feature enabled, which incurs negligible fees. If you mind, simply disable this feature.

Why are the input tokens for gpt-4o-mini image recognition always over 30,000?

This is normal because the token billing for image recognition in mini differs from other models. Please refer to the official documentation; our billing rules are consistent with theirs.

This model is not available in the group or the model has no usage permission

This is because some models are only available through specific channels. Please go to the model pricing page to see which groups can use this model, then click edit token on the token settings page to switch groups.

Why are some model multipliers more expensive than official?

To make it convenient for users, we decided to use the default group as a universal group. In most cases, users only need to use the default group to access all models they want without frequently switching groups. However, this leads to an issue where different model channels have varying costs. For costs that the default group cannot cover, we adjust the model multipliers, up to twice the official rate. If the cost exceeds twice the multiplier, we may open a new group. For example, the GPT model in the default group has an official 1x multiplier, so 0.9 yuan can use 1 USD of official quota, while the Claude model has higher costs, with a 2x multiplier, so 1.8 yuan can use 1 USD of official quota. Apart from this, everything else is consistent with official pricing. Users can use it with confidence.

Similar API Providers to Compare

OpenAISanShui API

api.aigpt4.top

Provides access to multiple AI models including GPT, Claude, DeepSeek, and Grok through a unified API interface.

132 shared models

钱多多 API

api2.aigcbest.top

Provides AI-generated content APIs for various applications, including text and image generation.

130 shared models

PoloAPI

poloai.top

An AI model API aggregation platform providing access to multiple providers including OpenAI, Claude, Gemini, and others for production environments.

119 shared models

柏拉图AI

api.bltcy.cn

柏拉图AI (api.bltcy.cn) is a multi-dimensional API integration platform providing access to over 600 AI models, serving as an alternate endpoint.

119 shared models

毫秒API

api.holdai.top

毫秒API provides a stable, high-bandwidth API forwarding service for OpenAI-compatible models, including GPT, Claude, and Midjourney, with global server deployment and transparent pricing.

119 shared models

小豆包API

api.linkapi.org

Xiaodoubao API is a Chinese AI API relay platform hosted at api.linkapi.org, advertising low-cost and high-concurrency access to OpenAI, Claude, Gemini, Grok, Flux, and agent workflows.

118 shared models
GitHub
Website
grok-3

Data as of Apr 27, 2026, 07:25 AM·Rankings are based on community-submitted tests and periodic health probes. Advisory only, not official data.

Leaderboard Rankings

Health
100.0%#94/100