|
--- |
|
license: cc-by-nc-4.0 |
|
base_model: mlabonne/Marcoro14-7B-slerp |
|
datasets: |
|
- argilla/distilabel-intel-orca-dpo-pairs |
|
--- |
|
|
|
# Model Card for decruz07/kellemar-DPO-Orca-Distilled-7B |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
This model was created using mlabonne/Marcoro14-7B-slerp as the base, and finetuned with argilla/distilabel-intel-orca-dpo-pairs |
|
|
|
|
|
## Model Details |
|
|
|
Finetuned with these specific parameters: |
|
Steps: 200 |
|
Learning Rate: 5e5 |
|
Beta: 0.1 |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
- **Developed by:** @decruz |
|
- **Funded by [optional]:** my full-time job |
|
- **Finetuned from model [optional]:** mlabonne/Marcoro14-7B-slerp |
|
|
|
## Benchmarks |
|
Top 5 in OpenLLM Benchmarks as of 2024/01/17 |
|
|
|
**OpenLLM** |
|
|Model| Average | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K | |
|
|---|---|---|---|---|---|---|---| |
|
|**kellemar-DPO-Orca-Distilled-7B-SLERP**| 73.71 | 70.48 | 87.56 | 65.33 |64.97 | 81.93 | 72.02 | |
|
|
|
**Nous** |
|
Model| AGIEval | GPT4All | TruthfulQA | Bigbench | Average | |
|
|---|---|---|---|---|---| |
|
|**kellemar-DPO-Orca-Distilled-7B-SLERP**| 45.27 | 76.42 | 65.48 | 47.21 |58.6 | |
|
|Marcoro14-7B-slerp| 44.66 | 76.24 | 64.15 | 45.64 |57.67 | |
|
|kellemar-DPO-Orca-Distilled-7B| 43.61 | 73.14 | 55.73 | 42.28 |53.69 | |
|
|kellemar-Orca-DPO-7B| 43.35 | 73.43 | 54.02 | 42.24 |53.26 | |
|
|OpenHermes-2.5-Mistral-7B| 43.07 | 73.12 | 53.04 | 40.96 |52.38 | |
|
|
|
## Uses |
|
|
|
You can use this for basic inference. You could probably finetune with this if you want to. |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
You can create a space out of this, or use basic python code to call the model directly and make inferences to it. |
|
|
|
[More Information Needed] |
|
|
|
## Training Details |
|
|
|
The following was used: |
|
`training_args = TrainingArguments( |
|
per_device_train_batch_size=4, |
|
gradient_accumulation_steps=4, |
|
gradient_checkpointing=True, |
|
learning_rate=5e-5, |
|
lr_scheduler_type="cosine", |
|
max_steps=200, |
|
save_strategy="no", |
|
logging_steps=1, |
|
output_dir=new_model, |
|
optim="paged_adamw_32bit", |
|
warmup_steps=100, |
|
bf16=True, |
|
report_to="wandb", |
|
) |
|
|
|
# Create DPO trainer |
|
dpo_trainer = DPOTrainer( |
|
model, |
|
ref_model, |
|
args=training_args, |
|
train_dataset=dataset, |
|
tokenizer=tokenizer, |
|
peft_config=peft_config, |
|
beta=0.1, |
|
max_prompt_length=1024, |
|
max_length=1536, |
|
)` |
|
|
|
### Training Data |
|
|
|
This was trained with https://huggingface.co./datasets/argilla/distilabel-intel-orca-dpo-pairs |
|
|
|
### Training Procedure |
|
|
|
Trained with Labonne's Google Colab Notebook on Finetuning Mistral 7B with DPO. |
|
|
|
## Model Card Authors [optional] |
|
|
|
@decruz |
|
|
|
## Model Card Contact |
|
|
|
@decruz on X/Twitter |