OpenAI Unveils o3 Model and Becomes First to Crack the ARC-AGI Benchmark in 5 Years

December 20, 2024

·

On the final day of the “12 days of OpenAI” announcements, OpenAI revealed the biggest update. OpenAI announced the o3 and o3-mini reasoning models, and most notably, OpenAI made history as o3 became the first AI model to crack the hallowed ARC-AGI benchmark, breaking a five-year unbeaten streak.

openai o3 arc-agi benchmark — Image Credit: OpenAI via YouTube

On the ARC-AGI Semi-Private Evaluation Set, OpenAI’s o3 model scored a whopping 87.5% when using high-compute resources and given more time to think. The ARC Prize threshold was set at 85%, close to what humans generally achieve. Just so you know, the OpenAI o1 model could only score 32%.

ARC-AGI is designed to test AI models for generalized intelligence, focusing on the ability to solve novel problems, rather than relying on memorized patterns. So with the o3 model, OpenAI has indeed achieved a historic breakthrough in generalized intelligence. It may bring OpenAI closer to achieving AGI (Artificial General Intelligence) — an AI system that can match or exceed human intelligence.

Image Credit: OpenAI via YouTube
Image Credit: OpenAI via YouTube

Besides ARC-AGI, OpenAI o3 scored 71.7 in SWE-bench Verified, 2,727 in Codeforces, 96.7 in AIME 2024, and 87.7 in GPQA Diamond. All these tests are highly challenging and the scores are significantly higher than what o1 achieved. Finally, in the EpochAI Frontier Math benchmark which requires expert mathematicians hours to solve a problem, OpenAI o3 got 25.2 accuracy. The earlier best score was just 2.0.

Image Credit: OpenAI via YouTube
Image Credit: OpenAI via YouTube

Coming to the o3-mini model, OpenAI says it’s a distilled model from o3, and optimized for coding, fast performance, and cost-efficiency. o3-mini has three compute settings: low, medium, and high. At medium setting, the o3-mini outperforms the larger o1 model and costs less. Its latency is also lower than the o1 model.

In case you are wondering why is it called o3, and not o2, well, to avoid legal issues with O2, the UK-based mobile network operator, OpenAI decided to skip o2 altogether.

Finally, about availability, OpenAI says it’s performing safety testing on o3 and o3-mini models. The company is also opening up the o3-mini model for public safety testing. OpenAI plans to release the o3-mini model by the end of January 2025. And after that, the o3 model will be released, after rigorous testing and approval by regulators.

Arjun Sha

Passionate about Windows, ChromeOS, Android, security and privacy issues. Have a penchant to solve everyday computing problems.

Source

Computers Phones and Gadgets

OpenAI Unveils o3 Model and Becomes First to Crack the ARC-AGI Benchmark in 5 Years

Arjun Sha

latest news

OCHA Brigade Named Best-Performing Government Agency in Anambra

SEDC boss urges Ohanaeze youth, others to have confidence in government

Social Commentator, Simon Utsu, congratulates Usoro Akpabio on her appointment as MD, SSDC

US4US Movement Congratulates Pioneer Managing Director, South South Development Commission, Madam Usoro Akpabio