ai2-adapt-dev/eurus2_ground_truth_with_random_max_length Viewer • Updated about 19 hours ago • 455k • 53
ai2-adapt-dev/eurus2_ground_truth_with_random_max_length Viewer • Updated about 19 hours ago • 455k • 53
ACECODER: Acing Coder RL via Automated Test-Case Synthesis Paper • 2502.01718 • Published 25 days ago • 28
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning Paper • 2502.01100 • Published 26 days ago • 16
HALoGEN: Fantastic LLM Hallucinations and Where to Find Them Paper • 2501.08292 • Published Jan 14 • 17