Accelerated Preference Optimization for Large Language Model Alignment Paper • 2410.06293 • Published Oct 8, 2024 • 5
MARS: Unleashing the Power of Variance Reduction for Training Large Models Paper • 2411.10438 • Published Nov 15, 2024 • 13
Self-Play Preference Optimization for Language Model Alignment Paper • 2405.00675 • Published May 1, 2024 • 25
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models Paper • 2401.01335 • Published Jan 2, 2024 • 64
Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation Paper • 2402.10210 • Published Feb 15, 2024 • 32