SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published 8 days ago • 99
SRMT: Shared Memory for Multi-agent Lifelong Pathfinding Paper • 2501.13200 • Published 13 days ago • 61
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models Paper • 2501.11873 • Published 15 days ago • 63
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper • 2501.07301 • Published 23 days ago • 89
PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models Paper • 2501.03124 • Published 30 days ago • 14
Test-time Computing: from System-1 Thinking to System-2 Thinking Paper • 2501.02497 • Published about 1 month ago • 41
Personalized Multimodal Large Language Models: A Survey Paper • 2412.02142 • Published Dec 3, 2024 • 13
Evaluating Language Models as Synthetic Data Generators Paper • 2412.03679 • Published Dec 4, 2024 • 46
LM-Cocktail: Resilient Tuning of Language Models via Model Merging Paper • 2311.13534 • Published Nov 22, 2023 • 3
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated Nov 28, 2024 • 501
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models Paper • 2411.00918 • Published Nov 1, 2024 • 8
Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality Paper • 2410.05210 • Published Oct 7, 2024 • 10