An intelligent load balancing platform for managing and distributing API requests to multiple AI providers.
| Model | Speed | Latency | Tests |
|---|---|---|---|
| llama3.1-8b | 2191.20 t/s | 0.35s | 10 |
| llama-4-scout-17b-16e-instruct | 1372.80 t/s | 0.36s | 5 |
| llama-3.3-70b | 1062.69 t/s | 0.51s | 5 |
| llama-4-maverick-17b-128e-instruct | 1052.78 t/s | 0.41s | 5 |
| qwen-3-coder-480b | 894.38 t/s | 0.35s | 5 |
| gpt-oss-120b | 846.32 t/s | 0.70s | 5 |
| qwen-3-235b-a22b-instruct-2507 | 754.92 t/s | 0.45s | 5 |
| qwen-3-32b | 705.04 t/s | 0.40s | 5 |
| qwen-3-235b-a22b-thinking-2507 | 579.82 t/s | 0.44s | 5 |
| models/gemini-2.5-flash | 180.81 t/s | 7.98s | 5 |
| Time | Model | Speed | Latency |
|---|---|---|---|
| Sep 21, 06:22 PM | llama3.1-8b | 2264.49 t/s | 0.35s |
| Sep 21, 06:21 PM | llama-4-maverick-17b-128e-instruct | 1052.78 t/s | 0.41s |
| Sep 21, 06:21 PM | llama-4-scout-17b-16e-instruct | 1372.80 t/s | 0.36s |
| Sep 21, 06:19 PM | llama-3.3-70b | 1062.69 t/s | 0.51s |
| Sep 21, 06:18 PM | qwen-3-235b-a22b-thinking-2507 | 579.82 t/s | 0.44s |
| Sep 21, 06:18 PM | qwen-3-coder-480b | 894.38 t/s | 0.35s |
| Sep 21, 06:17 PM | llama3.1-8b | 2117.91 t/s | 0.34s |
| Sep 21, 06:16 PM | qwen-3-32b | 705.04 t/s | 0.40s |
| Sep 21, 06:16 PM | gpt-oss-120b | 846.32 t/s | 0.70s |
| Sep 21, 06:14 PM | qwen-3-235b-a22b-instruct-2507 | 754.92 t/s | 0.45s |