-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 24 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 26 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 108 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
Collections
Discover the best community collections!
Collections including paper arxiv:2502.15589
-
Large Language Models Can Self-Improve in Long-context Reasoning
Paper • 2411.08147 • Published • 64 -
Reverse Thinking Makes LLMs Stronger Reasoners
Paper • 2411.19865 • Published • 22 -
Training Large Language Models to Reason in a Continuous Latent Space
Paper • 2412.06769 • Published • 78 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 97
-
PAS: Data-Efficient Plug-and-Play Prompt Augmentation System
Paper • 2407.06027 • Published • 9 -
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models
Paper • 2407.09025 • Published • 134 -
Toto: Time Series Optimized Transformer for Observability
Paper • 2407.07874 • Published • 32 -
SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers
Paper • 2407.09413 • Published • 11
-
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
Paper • 2405.14333 • Published • 40 -
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Paper • 2404.12253 • Published • 55 -
Improve Mathematical Reasoning in Language Models by Automated Process Supervision
Paper • 2406.06592 • Published • 28 -
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B
Paper • 2406.07394 • Published • 27