DataComp-LM: In search of the next generation of training sets for language models Paper • 2406.11794 • Published Jun 17 • 48
Training Language Models on Synthetic Edit Sequences Improves Code Synthesis Paper • 2410.02749 • Published 14 days ago • 12
Towards a Unified View of Preference Learning for Large Language Models: A Survey Paper • 2409.02795 • Published Sep 4 • 72
ORPO: Monolithic Preference Optimization without Reference Model Paper • 2403.07691 • Published Mar 12 • 61
Intuitive Fine-Tuning: Towards Unifying SFT and RLHF into a Single Process Paper • 2405.11870 • Published May 20