Gemma 2: Improving Open Language Models at a Practical Size Paper • 2408.00118 • Published Jul 31, 2024 • 76
Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning Paper • 2407.15762 • Published Jul 22, 2024 • 10
BOND: Aligning LLMs with Best-of-N Distillation Paper • 2407.14622 • Published Jul 19, 2024 • 19
WARP: On the Benefits of Weight Averaged Rewarded Policies Paper • 2406.16768 • Published Jun 24, 2024 • 22
WARP: On the Benefits of Weight Averaged Rewarded Policies Paper • 2406.16768 • Published Jun 24, 2024 • 22
WARP: On the Benefits of Weight Averaged Rewarded Policies Paper • 2406.16768 • Published Jun 24, 2024 • 22 • 1
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM Paper • 2403.07816 • Published Mar 12, 2024 • 39
WARM: On the Benefits of Weight Averaged Reward Models Paper • 2401.12187 • Published Jan 22, 2024 • 18 • 7
WARM: On the Benefits of Weight Averaged Reward Models Paper • 2401.12187 • Published Jan 22, 2024 • 18 • 7
Direct Language Model Alignment from Online AI Feedback Paper • 2402.04792 • Published Feb 7, 2024 • 29
WARM: On the Benefits of Weight Averaged Reward Models Paper • 2401.12187 • Published Jan 22, 2024 • 18 • 7
WARM: On the Benefits of Weight Averaged Reward Models Paper • 2401.12187 • Published Jan 22, 2024 • 18
WARM: On the Benefits of Weight Averaged Reward Models Paper • 2401.12187 • Published Jan 22, 2024 • 18
Unified Model for Image, Video, Audio and Language Tasks Paper • 2307.16184 • Published Jul 30, 2023 • 15