About ORPO
Contains some information and experiments fine-tuning LLMs using 🤗 `trl.ORPOTrainer`
Paper • 2403.07691 • Published • 63Note Annotated paper and personal notes coming soon!
HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1
Text Generation • Updated • 225 • 262Note ORPO full fine-tune of `mistral-community/Mixtral-8x22B-v0.1` with `argilla/distilabel-capybara-dpo-7k-binarized` with ChatML formatting (in collaboration with Hugging Face, Argilla and Kaist AI)
alvarobartt/mistral-orpo-mix
Text Generation • Updated • 28 • 1Note ORPO full fine-tune of `mistralai/Mistral-7B-v0.1` with `alvarobartt/dpo-mix-7k-simplified` with ChatML formatting
alvarobartt/Mistral-7B-v0.1-ORPO
Text Generation • Updated • 45 • 14Note ORPO fine-tune of `mistralai/Mistral-7B-v0.1` with `alvarobartt/dpo-mix-7k-simplified` with ChatML formatting using QLoRA (weights are merged into base model)
alvarobartt/Mistral-7B-v0.1-ORPO-PEFT
Text Generation • Updated • 4 • 1Note ORPO fine-tune of `mistralai/Mistral-7B-v0.1` with `alvarobartt/dpo-mix-7k-simplified` with ChatML formatting using QLoRA (only contains the adapter, for merged weights check `alvarobartt/Mistral-7B-v0.1-ORPO`)
alvarobartt/mistral-orpo-mix-b0.05-l1024-pl512-lr5e-7-cosine
Text Generation • Updated • 25alvarobartt/mistral-orpo-mix-b0.1-l2048-pl1792-lr5e-6-inverse-sqrt
Text Generation • Updated • 24