DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published 1 day ago • 94
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model Paper • 2501.12368 • Published 2 days ago • 31
Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments Paper • 2501.10893 • Published 5 days ago • 20
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training Paper • 2501.11425 • Published 3 days ago • 68
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding Paper • 2411.04282 • Published Nov 6, 2024 • 33
ProcessBench: Identifying Process Errors in Mathematical Reasoning Paper • 2412.06559 • Published Dec 9, 2024 • 78
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs Paper • 2412.18925 • Published 29 days ago • 95
O1 Replication Journey -- Part 3: Inference-time Scaling for Medical Reasoning Paper • 2501.06458 • Published 13 days ago • 29
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though Paper • 2501.04682 • Published 15 days ago • 89
Agent Laboratory: Using LLM Agents as Research Assistants Paper • 2501.04227 • Published 16 days ago • 80
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM Paper • 2501.01904 • Published 20 days ago • 31
Test-time Computing: from System-1 Thinking to System-2 Thinking Paper • 2501.02497 • Published 19 days ago • 41