About ORPO

alvarobartt 's Collections

updated Sep 2, 2024

Contains some information and experiments fine-tuning LLMs using 🤗 `trl.ORPOTrainer`

ORPO: Monolithic Preference Optimization without Reference Model

Paper • 2403.07691 • Published Mar 12, 2024 • 64

Note Annotated paper and personal notes coming soon!
HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1

Text Generation • Updated Apr 18, 2024 • 599 • 265

Note ORPO full fine-tune of `mistral-community/Mixtral-8x22B-v0.1` with `argilla/distilabel-capybara-dpo-7k-binarized` with ChatML formatting (in collaboration with Hugging Face, Argilla and Kaist AI)
alvarobartt/mistral-orpo-mix

Text Generation • Updated Mar 24, 2024 • 7 • 1

Note ORPO full fine-tune of `mistralai/Mistral-7B-v0.1` with `alvarobartt/dpo-mix-7k-simplified` with ChatML formatting
alvarobartt/Mistral-7B-v0.1-ORPO

Text Generation • Updated Mar 23, 2024 • 13 • 14

Note ORPO fine-tune of `mistralai/Mistral-7B-v0.1` with `alvarobartt/dpo-mix-7k-simplified` with ChatML formatting using QLoRA (weights are merged into base model)
alvarobartt/Mistral-7B-v0.1-ORPO-PEFT

Text Generation • Updated Mar 23, 2024 • 4 • 1

Note ORPO fine-tune of `mistralai/Mistral-7B-v0.1` with `alvarobartt/dpo-mix-7k-simplified` with ChatML formatting using QLoRA (only contains the adapter, for merged weights check `alvarobartt/Mistral-7B-v0.1-ORPO`)
alvarobartt/mistral-orpo-mix-b0.05-l1024-pl512-lr5e-7-cosine

Text Generation • Updated Mar 26, 2024 • 6
alvarobartt/mistral-orpo-mix-b0.1-l2048-pl1792-lr5e-6-inverse-sqrt

Text Generation • Updated Mar 26, 2024 • 5