The AI race took a giant leap forward after Google DeepMind and OpenAI announced their models achieved gold-medal scores at the International Mathematical Olympiad (IMO), marking the first time AI has outperformed traditional human benchmarks in the world’s most elite math contest.
Google’s Gemini Deep Think solved five of six Olympiad problems entirely in natural language within the 4.5-hour time limit, while OpenAI’s experimental model, using extended reasoning and parallel computing, achieved equivalent results independently verified by IMO gold medalists.
“This breakthrough proves AI can now tackle some of the hardest reasoning problems in a way that could redefine mathematical research,” said Junehyuk Jung, a Brown University professor and visiting researcher at DeepMind.
OpenAI researcher Noam Brown explained the achievement relied on massively scaling “test-time compute,” allowing the model to evaluate multiple reasoning paths simultaneously. While he declined to disclose the exact cost, he called the approach “extremely expensive but transformative.”
Google’s DeepMind, which only achieved a silver-level performance last year, officially worked with IMO judges to certify its results, while OpenAI published its findings shortly after the competition’s closing ceremony.
Beyond mathematics, both labs believe AI reasoning capabilities could soon be applied to complex physics, engineering, and unsolved scientific puzzles.
This year’s IMO hosted 630 students in Queensland, Australia, where only 11% achieved gold-medal status. AI models joining that elite group marks a historic moment for artificial intelligence.