Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2412.16849

OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning

Paper • 2412.16849 • Published 21 days ago • 8
Offline Reinforcement Learning for LLM Multi-Step Reasoning

Paper • 2412.16145 • Published 22 days ago • 38

interest_need_read

感兴趣热门论文集合

ProcessBench: Identifying Process Errors in Mathematical Reasoning

Paper • 2412.06559 • Published Dec 9, 2024 • 74
Maya: An Instruction Finetuned Multilingual Multimodal Model

Paper • 2412.07112 • Published Dec 10, 2024 • 26
OpenAI o1 System Card

Paper • 2412.16720 • Published 21 days ago • 31
Diving into Self-Evolving Training for Multimodal Reasoning

Paper • 2412.17451 • Published 20 days ago • 42

Extending Llama-3's Context Ten-Fold Overnight

Paper • 2404.19553 • Published Apr 30, 2024 • 33
ReFT: Representation Finetuning for Language Models

Paper • 2404.03592 • Published Apr 4, 2024 • 91
Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck

Paper • 2404.07647 • Published Apr 11, 2024 • 4
SciGLM: Training Scientific Language Models with Self-Reflective Instruction Annotation and Tuning

Paper • 2401.07950 • Published Jan 15, 2024 • 4

Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers

Paper • 2408.06195 • Published Aug 12, 2024 • 67
Thinking LLMs: General Instruction Following with Thought Generation

Paper • 2410.10630 • Published Oct 14, 2024 • 18
Democratizing Reasoning Ability: Tailored Learning from Large Language Model

Paper • 2310.13332 • Published Oct 20, 2023 • 14
OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning

Paper • 2412.16849 • Published 21 days ago • 8

Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering

Paper • 2411.11504 • Published Nov 18, 2024 • 20
Top-nσ: Not All Logits Are You Need

Paper • 2411.07641 • Published Nov 12, 2024 • 19
Adaptive Decoding via Latent Preference Optimization

Paper • 2411.09661 • Published Nov 14, 2024 • 10
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training

Paper • 2411.13476 • Published Nov 20, 2024 • 15

ORPO: Monolithic Preference Optimization without Reference Model

Paper • 2403.07691 • Published Mar 12, 2024 • 64
sDPO: Don't Use Your Data All at Once

Paper • 2403.19270 • Published Mar 28, 2024 • 40
Teaching Large Language Models to Reason with Reinforcement Learning

Paper • 2403.04642 • Published Mar 7, 2024 • 46
Best Practices and Lessons Learned on Synthetic Data for Language Models

Paper • 2404.07503 • Published Apr 11, 2024 • 29

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs