Mistral-ORPO-β (7B)

Mistral-ORPO is a fine-tuned version of mistralai/Mistral-7B-v0.1 using the odds ratio preference optimization (ORPO). With ORPO, the model directly learns the preference without the supervised fine-tuning warmup phase. Mistral-ORPO-β is fine-tuned exclusively on the 61k instances of the cleaned version of UltraFeedback, argilla/ultrafeedback-binarized-preferences-cleaned, by Argilla.

Github Repository: https://github.com/xfactlab/orpo

👍 Model Performance

1) AlpacaEval & MT-Bench

Model Name	Size	Align	MT-Bench	AlpacaEval 1.0	AlpacaEval 2.0
Mistral-`ORPO`-⍺	7B	`ORPO`	7.23	87.92	11.33
Mistral-`ORPO`-β	7B	`ORPO`	7.32	91.41	12.20
Zephyr β	7B	DPO	7.34	90.60	10.99
TULU-2-DPO	13B	DPO	7.00	89.5	10.12
Llama-2-Chat	7B	RLHF	6.27	71.37	4.96
Llama-2-Chat	13B	RLHF	6.65	81.09	7.70

2) IFEval

Model Type	Prompt-Strict	Prompt-Loose	Inst-Strict	Inst-Loose
Mistral-ORPO-⍺	0.5009	0.5083	0.5995	0.6163
Mistral-ORPO-β	0.5287	0.5564	0.6355	0.6619

🗺️ MT-Bench by Category

🖥️ Inference

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("kaist-ai/mistral-orpo-beta")
tokenizer = AutoTokenizer.from_pretrained("kaist-ai/mistral-orpo-beta")

# Apply chat template
query = [{'role': 'user', 'content': 'Hi! How are you doing?'}]
prompt = tokenizer.apply_chat_template(query, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors='pt')

# Generation with specific configurations
output = model.generate(
  **inputs,
  max_new_tokens=128,
  do_sample=True,
  temperature=0.7
)
response = tokenizer.batch_decode(output)

#<|user|>
#Hi! How are you doing?</s>
#<|assistant|>
#I'm doing well, thank you! How are you?</s>

📎 Citation

@misc{hong2024orpo,
      title={ORPO: Monolithic Preference Optimization without Reference Model}, 
      author={Jiwoo Hong and Noah Lee and James Thorne},
      year={2024},
      eprint={2403.07691},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Downloads last month: 129

Safetensors

Model size

7.24B params

Tensor type

BF16

Inference Providers NEW

Text Generation

This model is not currently available via any of the supported Inference Providers.

Model tree for kaist-ai/mistral-orpo-beta

Base model

mistralai/Mistral-7B-v0.1

Finetuned

(851)

this model

Merges

5 models

Quantizations

1 model

Dataset used to train kaist-ai/mistral-orpo-beta

Spaces using kaist-ai/mistral-orpo-beta 7

Collection including kaist-ai/mistral-orpo-beta

ORPO

Collection

This is the official collection of "ORPO: Monolithic Preference Optimization without Reference Model". • 5 items • Updated Apr 12, 2024 • 11

Evaluation results

normalized accuracy on AI2 Reasoning Challenge (25-Shot)
test set Open LLM Leaderboard

61.180
normalized accuracy on HellaSwag (10-Shot)
validation set Open LLM Leaderboard

84.030
mc2 on TruthfulQA (0-shot)
validation set Open LLM Leaderboard

47.690
accuracy on GSM8k (5-shot)
test set Open LLM Leaderboard

39.800
accuracy on MMLU (5-Shot)
test set Open LLM Leaderboard

63.260
accuracy on Winogrande (5-shot)
validation set Open LLM Leaderboard

79.240
Win Rate on AlpacaEval 1
Leaderboard

91.16%
Win Rate on AlpacaEval 2
Leaderboard

12.57%
Score on MT-Bench
self-reported

7.322

View on Papers With Code