ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning Paper • 2502.01100 • Published 26 days ago • 16
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning Paper • 2502.01100 • Published 26 days ago • 16
Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena Paper • 2310.05746 • Published Oct 9, 2023 • 1
SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals Paper • 2406.04784 • Published Jun 7, 2024 • 1
SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories Paper • 2409.07440 • Published Sep 11, 2024 • 8
view article Article Argunauts: Open LLMs that Master Argument Analysis with Argdown By ggbetz • 14 days ago • 2