Collections
Discover the best community collections!
Collections including paper arxiv:2412.16849
-
ProcessBench: Identifying Process Errors in Mathematical Reasoning
Paper • 2412.06559 • Published • 74 -
Maya: An Instruction Finetuned Multilingual Multimodal Model
Paper • 2412.07112 • Published • 26 -
OpenAI o1 System Card
Paper • 2412.16720 • Published • 31 -
Diving into Self-Evolving Training for Multimodal Reasoning
Paper • 2412.17451 • Published • 42
-
Extending Llama-3's Context Ten-Fold Overnight
Paper • 2404.19553 • Published • 33 -
ReFT: Representation Finetuning for Language Models
Paper • 2404.03592 • Published • 91 -
Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck
Paper • 2404.07647 • Published • 4 -
SciGLM: Training Scientific Language Models with Self-Reflective Instruction Annotation and Tuning
Paper • 2401.07950 • Published • 4
-
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
Paper • 2408.06195 • Published • 67 -
Thinking LLMs: General Instruction Following with Thought Generation
Paper • 2410.10630 • Published • 18 -
Democratizing Reasoning Ability: Tailored Learning from Large Language Model
Paper • 2310.13332 • Published • 14 -
OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning
Paper • 2412.16849 • Published • 8
-
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Paper • 2411.11504 • Published • 20 -
Top-nσ: Not All Logits Are You Need
Paper • 2411.07641 • Published • 19 -
Adaptive Decoding via Latent Preference Optimization
Paper • 2411.09661 • Published • 10 -
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training
Paper • 2411.13476 • Published • 15
-
ORPO: Monolithic Preference Optimization without Reference Model
Paper • 2403.07691 • Published • 64 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 40 -
Teaching Large Language Models to Reason with Reinforcement Learning
Paper • 2403.04642 • Published • 46 -
Best Practices and Lessons Learned on Synthetic Data for Language Models
Paper • 2404.07503 • Published • 29