Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co./docs/hub/model-cards#model-card-metadata)

ORPO

Updates (24.03.25)

 

This is the official repository for ORPO: Monolithic Preference Optimization without Reference Model. The detailed results in the paper can be found in:

Model Checkpoints

Our models trained with ORPO can be found in:

And the corresponding logs for the average log probabilities of chosen/rejected responses during training are reported in:

 

AlpacaEval

Description of the image
Figure 1. AlpacaEval 2.0 score for the models trained with different alignment methods.

 

MT-Bench

Description of the image
Figure 2. MT-Bench result by category.

 

IFEval

IFEval scores are measured with EleutherAI/lm-evaluation-harness by applying the chat template. The scores for Llama-2-Chat (70B), Zephyr-β (7B), and Mixtral-8X7B-Instruct-v0.1 are originally reported in this tweet.

Model Type Prompt-Strict Prompt-Loose Inst-Strict Inst-Loose
Llama-2-Chat (70B) 0.4436 0.5342 0.5468 0.6319
Zephyr-β (7B) 0.4233 0.4547 0.5492 0.5767
Mixtral-8X7B-Instruct-v0.1 0.5213 0.5712 0.6343 0.6823
Mistral-ORPO-⍺ (7B) 0.5009 0.5083 0.5995 0.6163
Mistral-ORPO-β (7B) 0.5287 0.5564 0.6355 0.6619
Downloads last month
12
Safetensors
Model size
620M params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.