What is OpenAI o3
In December 2024, OpenAI announced the o3 model on the final day of their “12 Days of OpenAI” event. As the successor to o1, this model represents a significant evolution in reasoning capabilities, recording remarkable scores particularly on the ARC-AGI benchmark.
Reference: OpenAI - o3 Announcement
Remarkable Benchmark Results
ARC-AGI (Abstract Reasoning)
| Model | Score |
|---|---|
| GPT-4o | 5% |
| o1 | 32% |
| o3 (low compute) | 75.7% |
| o3 (high compute) | 87.5% |
| Human average | 85% |
o3 has become the first AI model to exceed human average.
Other Benchmarks
Math (AIME 2024): 96.7%
Coding (Codeforces): 2727 Elo (99.95 percentile)
Science (GPQA Diamond): 87.7%
Reference: ARC Prize - o3 Results
o3 Technical Features
1. Compute Scaling
A key feature of o3 is the ability to adjust compute during inference.
from openai import OpenAI
client = OpenAI()
# Low compute mode (fast, low cost)
response_fast = client.chat.completions.create(
model="o3-mini",
reasoning_effort="low",
messages=[{"role": "user", "content": "Simple question"}]
)
# High compute mode (high precision, high cost)
response_precise = client.chat.completions.create(
model="o3",
reasoning_effort="high",
messages=[{"role": "user", "content": "Complex mathematical proof"}]
)
2. o3-mini
A more efficient version that outperforms o1 on many tasks.
| Comparison | o1-mini | o3-mini |
|---|---|---|
| AIME 2024 | 70% | 84% |
| Speed | Baseline | ~2x faster |
| Cost | Baseline | ~40% reduction |
Reference: OpenAI API Documentation
Safety Measures
Deliberative Alignment
o3 introduces a new safety mechanism called “deliberative alignment.”
1. Analyze user intent
2. Evaluate potential risks
3. Verify alignment with safety policies
4. Generate appropriate response
Safety Test Results
- Harmful content generation resistance: 99.2%
- Jailbreak resistance: 98.5%
- Misinformation prevention: 97.8%
How to Use
API Usage
from openai import OpenAI
client = OpenAI()
# Complex reasoning with o3
response = client.chat.completions.create(
model="o3",
messages=[
{
"role": "user",
"content": """
Please solve the following puzzle:
There is a 3x3 grid where each cell contains a number from 1-9.
Make each row and column sum to 15.
"""
}
]
)
print(response.choices[0].message.content)
ChatGPT Usage
ChatGPT Plus/Pro users can use o3 through ChatGPT.
Setup:
1. Log in to ChatGPT
2. Select o3 in model selection
3. Enable "Reasoning mode"
Reference: ChatGPT - OpenAI
o3 vs Competitors
| Capability | o3 | Gemini 2.0 | Claude Opus 4.5 |
|---|---|---|---|
| Math reasoning | Excellent | Good | Good |
| Coding | Excellent | Good | Excellent |
| Abstract reasoning | Excellent | Good | Good |
| Speed | Fair | Excellent | Good |
| Cost | Fair | Good | Good |
Pricing Structure (Expected)
| Model | Input (1M tokens) | Output (1M tokens) |
|---|---|---|
| o3 | $60 | $240 |
| o3-mini | $15 | $60 |
| o1 | $15 | $60 |
Note: Official pricing will be announced at general release
Summary
OpenAI o3 has achieved a new milestone in reasoning capabilities.
- ARC-AGI 87.5%: Abstract reasoning exceeding human average
- Codeforces 2727 Elo: World-class coding ability
- Compute scaling: Precision-cost tradeoff possible
- Enhanced safety: Introduction of Deliberative Alignment
General availability is scheduled for late January 2025.
← Back to list