Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation Paper • 2502.19414 • Published 3 days ago • 16 • 2
Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2 Paper • 2502.03544 • Published 24 days ago • 42 • 5
Great Models Think Alike and this Undermines AI Oversight Paper • 2502.04313 • Published 23 days ago • 30 • 2