On the final day of the “12 days of OpenAI” announcements, OpenAI revealed the biggest update. OpenAI announced the o3 and o3-mini reasoning models, and most notably, OpenAI made history as o3 became the first AI model to crack the hallowed ARC-AGI benchmark, breaking a five-year unbeaten streak.
On the ARC-AGI Semi-Private Evaluation Set, OpenAI’s o3 model scored a whopping 87.5% when using high-compute resources and given more time to think. The ARC Prize threshold was set at 85%, close to what humans generally achieve. Just so you know, the OpenAI o1 model could only score 32%.
ARC-AGI is designed to test AI models for generalized intelligence, focusing on the ability to solve novel problems, rather than relying on memorized patterns. So with the o3 model, OpenAI has indeed achieved a historic breakthrough in generalized intelligence. It may bring OpenAI closer to achieving AGI (Artificial General Intelligence) — an AI system that can match or exceed human intelligence.
Besides ARC-AGI, OpenAI o3 scored 71.7 in SWE-bench Verified, 2,727 in Codeforces, 96.7 in AIME 2024, and 87.7 in GPQA Diamond. All these tests are highly challenging and the scores are significantly higher than what o1 achieved. Finally, in the EpochAI Frontier Math benchmark which requires expert mathematicians hours to solve a problem, OpenAI o3 got 25.2 accuracy. The earlier best score was just 2.0.
Coming to the o3-mini model, OpenAI says it’s a distilled model from o3, and optimized for coding, fast performance, and cost-efficiency. o3-mini has three compute settings: low, medium, and high. At medium setting, the o3-mini outperforms the larger o1 model and costs less. Its latency is also lower than the o1 model.
In case you are wondering why is it called o3, and not o2, well, to avoid legal issues with O2, the UK-based mobile network operator, OpenAI decided to skip o2 altogether.
Finally, about availability, OpenAI says it’s performing safety testing on o3 and o3-mini models. The company is also opening up the o3-mini model for public safety testing. OpenAI plans to release the o3-mini model by the end of January 2025. And after that, the o3 model will be released, after rigorous testing and approval by regulators.