---
license: apache-2.0
base_model: mosaicml/mpt-7b-instruct
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: MPT_1000_STEPS_1e5_rate_01_beta_DPO
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# MPT_1000_STEPS_1e5_rate_01_beta_DPO

This model is a fine-tuned version of [mosaicml/mpt-7b-instruct](https://huggingface.co./mosaicml/mpt-7b-instruct) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.8946
- Rewards/chosen: -4.4962
- Rewards/rejected: -4.4462
- Rewards/accuracies: 0.4901
- Rewards/margins: -0.0501
- Logps/rejected: -66.0193
- Logps/chosen: -65.7547
- Logits/rejected: 8.4623
- Logits/chosen: 8.4615

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.7056        | 0.05  | 50   | 0.9054          | -1.8795        | -1.8769          | 0.4857             | -0.0027         | -40.3261       | -39.5876     | 13.2447         | 13.2474       |
| 1.3284        | 0.1   | 100  | 1.3365          | -5.2198        | -5.1996          | 0.4835             | -0.0202         | -73.5531       | -72.9898     | 40.0297         | 40.0297       |
| 4.0395        | 0.15  | 150  | 1.2940          | -5.6920        | -5.6131          | 0.4637             | -0.0789         | -77.6884       | -77.7120     | 34.5576         | 34.5577       |
| 1.1998        | 0.2   | 200  | 1.1437          | -4.4153        | -4.3103          | 0.4747             | -0.1050         | -64.6601       | -64.9452     | 14.5309         | 14.5309       |
| 1.0001        | 0.24  | 250  | 1.3580          | -5.0983        | -5.0232          | 0.5033             | -0.0751         | -71.7890       | -71.7751     | 24.0739         | 24.0735       |
| 1.1726        | 0.29  | 300  | 1.0394          | -4.1980        | -4.0831          | 0.4879             | -0.1149         | -62.3888       | -62.7721     | 16.4743         | 16.4742       |
| 1.0955        | 0.34  | 350  | 1.0584          | -4.9210        | -4.7783          | 0.4747             | -0.1427         | -69.3404       | -70.0020     | 20.7178         | 20.7172       |
| 1.2598        | 0.39  | 400  | 1.0408          | -3.8776        | -3.8210          | 0.4945             | -0.0566         | -59.7678       | -59.5681     | 17.0600         | 17.0587       |
| 1.2403        | 0.44  | 450  | 0.9855          | -4.8112        | -4.6991          | 0.4747             | -0.1121         | -68.5488       | -68.9046     | 10.9237         | 10.9226       |
| 1.2967        | 0.49  | 500  | 0.9814          | -4.7410        | -4.6563          | 0.4769             | -0.0846         | -68.1207       | -68.2017     | 15.1832         | 15.1825       |
| 1.152         | 0.54  | 550  | 0.9258          | -4.6800        | -4.6273          | 0.4989             | -0.0527         | -67.8303       | -67.5925     | 9.7415          | 9.7409        |
| 0.9473        | 0.59  | 600  | 0.9416          | -3.6301        | -3.6600          | 0.5341             | 0.0299          | -58.1573       | -57.0931     | 10.5794         | 10.5787       |
| 0.9534        | 0.64  | 650  | 0.9361          | -4.7539        | -4.6806          | 0.4681             | -0.0733         | -68.3630       | -68.3308     | 11.2450         | 11.2442       |
| 0.985         | 0.68  | 700  | 0.9194          | -4.5437        | -4.5232          | 0.5011             | -0.0205         | -66.7896       | -66.2292     | 9.1942          | 9.1934        |
| 0.97          | 0.73  | 750  | 0.9090          | -4.6508        | -4.5989          | 0.4835             | -0.0520         | -67.5462       | -67.3006     | 8.0813          | 8.0806        |
| 0.8148        | 0.78  | 800  | 0.8992          | -4.5695        | -4.5180          | 0.4923             | -0.0515         | -66.7373       | -66.4875     | 8.3458          | 8.3450        |
| 0.9668        | 0.83  | 850  | 0.8976          | -4.5172        | -4.4650          | 0.4901             | -0.0521         | -66.2078       | -65.9638     | 8.2885          | 8.2877        |
| 0.9438        | 0.88  | 900  | 0.8952          | -4.4950        | -4.4441          | 0.4923             | -0.0509         | -65.9988       | -65.7424     | 8.4833          | 8.4825        |
| 1.0069        | 0.93  | 950  | 0.8954          | -4.4971        | -4.4461          | 0.4901             | -0.0510         | -66.0188       | -65.7634     | 8.4615          | 8.4607        |
| 0.7377        | 0.98  | 1000 | 0.8946          | -4.4962        | -4.4462          | 0.4901             | -0.0501         | -66.0193       | -65.7547     | 8.4623          | 8.4615        |


### Framework versions

- Transformers 4.39.1
- Pytorch 2.0.0+cu117
- Datasets 2.18.0
- Tokenizers 0.15.2