Math Benchmarking - Search News

FrontierMath Benchmark Exposes AI Struggles in Advanced Math

In this episode of eSpeaks, Jennifer Margles, Director of Product Management at BMC Software, discusses the transition from traditional job scheduling to the era of the autonomous enterprise. eSpeaks’ ...

VentureBeat

New open-source math model Light-R1-32B surpasses equivalent DeepSeek performance with only $1000 in training costs

Researchers have introduced Light-R1-32B, a new open-source AI model optimized to solve advanced math problems. It is now available on Hugging Face under a permissive Apache 2.0 license — free for ...

25d

Nvidia's Nemotron-Cascade 2 wins math and coding gold medals with 3B active parameters — and its post-training recipe is now open-source

Nvidia's Nemotron-Cascade 2 is a 30B MoE model that activates only 3B parameters at inference time, yet achieved gold medal-level performance at the 2025 IMO, IOI, and ICPC World Finals. Nvidia has ...

The National Law Review

Show inaccessible results

FrontierMath Benchmark Exposes AI Struggles in Advanced Math

New open-source math model Light-R1-32B surpasses equivalent DeepSeek performance with only $1000 in training costs

Nvidia's Nemotron-Cascade 2 wins math and coding gold medals with 3B active parameters — and its post-training recipe is now open-source

ORCA Benchmark Shows That AI Frequently Fumbles Everyday Math

AI is actually bad at math, ORCA shows

OpenAI GPT score on FrontierMath Benchmark by June 30?

Move over math and reasoning, it's time to benchmark AI using Super Mario Bros.

Google Gemini score on FrontierMath Benchmark by June 30?