metadata
license: mit
library_name: transformers
tags:
- trl
- orpo
- generated_from_trainer
datasets:
- argilla/distilabel-capybara-dpo-7k-binarized
base_model: wandb/Mistral-7B-v0.2
Mistral 7B Zephyr Orpo
The Zephyr Orpo recipe applied on top of Mistral 7B v0.2 (new recipe with new Mistral base model)
Model description
- Model type: A 7.2B parameter GPT-like model fine-tuned on a mix of publicly available, synthetic datasets.
- Language(s) (NLP): Primarily English
- Finetuned from model: wandb/Mistral-7B-v0.2
Recipe
We trained using the alignment handbook recipe and logging to W&B
Visit the W&B workspace here
Results:
- MT bench
########## First turn ##########
score
model turn
zephyr-orpo-7b-v0.2 1 7.44375
########## Second turn ##########
score
model turn
zephyr-orpo-7b-v0.2 2 6.875
########## Average ##########
score
model
zephyr-orpo-7b-v0.2 7.159375