Google has released a groundbreaking AI model called Gemini 2.5 Pro that has scored 18.8% on Humanity’s Last Exam (HLE) without using web search or any other tools. HLE is a rigorous benchmark, designed by subject matter experts and top academicians from around the world to test in-depth knowledge on various subjects. Previously, OpenAI’s o3-mini-high achieved 14% on the same benchmark without using any tools.
Gemini 2.5 Pro is a thinking model, meaning it’s a reasoning model, built on top of a larger base LLM, using reinforcement learning and chain-of-thought prompting. Before the Gemini 2.5 Pro model, Google had released the smaller Gemini 2.0 Flash Thinking model.
Google says the Gemini 2.5 Pro model can “analyze information, draw logical conclusions, incorporate context and nuance, and make informed decisions.”
Gemini 2.5 Pro was being tested on LMArena under the codename “nebula”. Now, Gemini 2.5 Pro has taken the top position on the LMArena leaderboard with the highest score of 1,443, beating Grok 3 and GPT-4.5. As for other benchmarks, Google says Gemini 2.5 Pro performs exceptionally well in coding, math, and science.
In GPQA Diamond, Gemini 2.5 Pro scored 84%; in AIME 2025, the model achieved 86.7%. Even in the SWE-bench verified benchmark that tests the ability to solve real-world software issues, Gemini 2.5 Pro scored 63.8%, second only to Claude 3.7 Sonnet Extended Thinking, which scored 70.3%.
Google says the new Gemini 2.5 Pro model is capable of advanced coding and reasoning. It’s rolling out to Gemini Advanced users. Those who want to test the Gemini 2.5 Pro model for free can head to Google AI Studio (visit) and select the “Gemini 2.5 Pro Experimental 03-25” model from the drop-down menu.
Source: Beebom