Data points: 58
The readout for GPT-OSS and Llama 3.3, before the detailed comparison sheet.
Decision read
GPT-OSS
GPT-OSS currently has the stronger profile, with verified wins split 4 to 2.
Evidence depth
58 data points
Includes 0 benchmark rows, 0 audit samples, and 6 provider examples.
Selection signal
Start with GPT-OSS
The charts below split 6 high-signal samples across speed, scores, and audit health.
Switch either side of this report to compare another model with the same LMSpeed data pipeline.
Select a different model to open a new comparison URL.
Model compare GPT-OSS vs Llama 3.3gpt-oss-vs-llama-3-3 | Model A GPT-OSS | Model B Llama 3.3 |
|---|---|---|
| Overall leader | Leading | Contender |
| Verified metric wins | 4 wins | 2 wins |
| Where it leads | Cheapest input price, Free providers, Provider coverage, Recent tests |
Third-party benchmark profile synced into LMSpeed; only metrics available for both models are shown.
Compare benchmark category scores on a 0-100 scale. Select a category to inspect the gap.
Avg. score
GPT-OSS
57.6
Avg. score
Llama 3.3
-
Selected category
Agents
GPT-OSS
Metric-level scores with benchmark source, rank depth, confidence, error, and evaluation date where available.
No shared professional benchmark scores are available yet.
Latest completed audits from shared providers, with four safety and integrity score groups plus report links.
| Provider | GPT-OSS | Llama 3.3 |
|---|---|---|
| No completed audits are available from shared providers yet. | ||
Speed aggregates and input/output pricing share each provider row for real API selection and migration cost checks.
| Provider | GPT-OSS | Llama 3.3 |
|---|---|---|
25 tests | GPT-OSS speed / latency 482 tok/s / 428ms input / output No data | Llama 3.3 |
This report only uses LMSpeed data for GPT-OSS and Llama 3.3: pricing, speed aggregates, third-party benchmark scores, and shared provider samples.
| Guidance | GPT-OSS | Llama 3.3 |
|---|---|---|
| When to choose each model | GPT-OSS GPT-OSS is stronger when you prioritize Cheapest input price, Free providers, Provider coverage, Recent tests. | Llama 3.3 Llama 3.3 is stronger when you prioritize Average speed, First-token latency. |
TL;DR: GPT-OSS leads across 58 verifiable data points, including pricing, speed, latency, benchmarks, and provider examples.
Continue from GPT-OSS vs Llama 3.3 into nearby model comparisons with enough verified LMSpeed data.
| Average speed, First-token latency |
| Model metadata | GPT-OSS exposes 131.1K tokens; notable signals: Text input, Text output, Tool calling, Reasoning. | No OpenRouter metadata is available yet for this model. |
|---|---|---|
| Developer | No data | Meta |
| Context window | 131.1K tokens | No data |
| Max output | 131.1K tokens | No data |
| Released | Aug 2025 | No data |
| Modalities | Input Text Output Text | No data |
| Features | Text inputText outputTool callingReasoning | None listed |
| Parameters | 120B | No data |
| Tokenizer | GPT | No data |
| Knowledge cutoff | 2024-06-30 | No data |
| OpenRouter ID | openai/gpt-oss-120b:free | No data |
| References | No data | No data |
speed / latency
985 tok/s / 453ms
input / output
No data
APDSM0 tests | GPT-OSS speed / latency N/A / N/A input / output No data | Llama 3.3 speed / latency N/A / N/A input / output No data |
|---|
CHB API0 tests | GPT-OSS openai/gpt-oss-20b speed / latency N/A / N/A input / output $1.03/M / $1.03/M | Llama 3.3 llama-3.3-70b speed / latency N/A / N/A input / output $1.03/M / $1.03/M |
|---|
HotaruAPI0 tests | GPT-OSS speed / latency N/A / N/A input / output No data | Llama 3.3 speed / latency N/A / N/A input / output No data |
|---|
IXIOCCAPI0 tests | GPT-OSS openai/gpt-oss-120b speed / latency N/A / N/A input / output $0.300/M / $1.50/M | Llama 3.3 llama-3.3-70b speed / latency N/A / N/A input / output $75.00/M / $75.00/M |
|---|
GPT-OSS openai/gpt-oss-120b speed / latency No data input / output $1.23/M / $3.40/M | Llama 3.3 meta/llama-3.3-70b speed / latency No data input / output $15.41/M / $15.41/M |
Rankings are based on community-submitted tests and periodic health probes. Advisory only, not official data.