How to Get Your LLM to Generate Challenging Problems for Evaluation Paper • 2502.14678 • Published 8 days ago • 16
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published 8 days ago • 92
Diverse Inference and Verification for Advanced Reasoning Paper • 2502.09955 • Published 15 days ago • 16
DarwinLM: Evolutionary Structured Pruning of Large Language Models Paper • 2502.07780 • Published 17 days ago • 17 • 7
DarwinLM: Evolutionary Structured Pruning of Large Language Models Paper • 2502.07780 • Published 17 days ago • 17
Expect the Unexpected: FailSafe Long Context QA for Finance Paper • 2502.06329 • Published 18 days ago • 124
An Open Recipe: Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging Paper • 2502.09056 • Published 16 days ago • 30