-
A General Theoretical Paradigm to Understand Learning from Human Preferences
Paper • 2310.12036 • Published • 14 -
ORPO: Monolithic Preference Optimization without Reference Model
Paper • 2403.07691 • Published • 62 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 48
jarvis
jarvis8x7b
AI & ML interests
None yet
Organizations
None yet
Collections
1
models
None public yet
datasets
None public yet