---
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.2
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: mistralit2_1000_STEPS_1e6_rate_05_beta_DPO
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# mistralit2_1000_STEPS_1e6_rate_05_beta_DPO

This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co./mistralai/Mistral-7B-Instruct-v0.2) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.6223
- Rewards/chosen: -1.9087
- Rewards/rejected: -2.8966
- Rewards/accuracies: 0.6593
- Rewards/margins: 0.9879
- Logps/rejected: -34.3656
- Logps/chosen: -27.2032
- Logits/rejected: -2.8455
- Logits/chosen: -2.8459

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-07
- train_batch_size: 4
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6684        | 0.1   | 50   | 0.6660          | -0.2264        | -0.2957          | 0.5934             | 0.0693          | -29.1637       | -23.8386     | -2.8636         | -2.8639       |
| 0.5945        | 0.2   | 100  | 0.6396          | -1.5064        | -1.9635          | 0.6044             | 0.4572          | -32.4994       | -26.3985     | -2.8444         | -2.8447       |
| 0.4899        | 0.29  | 150  | 0.6602          | -2.2474        | -2.9308          | 0.6022             | 0.6835          | -34.4341       | -27.8806     | -2.8445         | -2.8448       |
| 0.5517        | 0.39  | 200  | 0.6024          | -0.7758        | -1.2571          | 0.6418             | 0.4813          | -31.0867       | -24.9374     | -2.8613         | -2.8616       |
| 0.6385        | 0.49  | 250  | 0.5703          | -0.5516        | -1.1264          | 0.6703             | 0.5749          | -30.8253       | -24.4890     | -2.8571         | -2.8574       |
| 0.5653        | 0.59  | 300  | 0.5989          | -1.4256        | -2.1727          | 0.6440             | 0.7471          | -32.9178       | -26.2370     | -2.8464         | -2.8467       |
| 0.5255        | 0.68  | 350  | 0.6054          | -1.6264        | -2.4443          | 0.6484             | 0.8179          | -33.4610       | -26.6386     | -2.8533         | -2.8536       |
| 0.6612        | 0.78  | 400  | 0.6157          | -1.7163        | -2.5329          | 0.6418             | 0.8166          | -33.6383       | -26.8185     | -2.8530         | -2.8533       |
| 0.646         | 0.88  | 450  | 0.6016          | -1.1753        | -1.8651          | 0.6440             | 0.6898          | -32.3026       | -25.7364     | -2.8525         | -2.8529       |
| 0.5146        | 0.98  | 500  | 0.5957          | -1.1531        | -1.8752          | 0.6484             | 0.7221          | -32.3227       | -25.6920     | -2.8553         | -2.8556       |
| 0.297         | 1.07  | 550  | 0.5863          | -1.2310        | -2.0319          | 0.6571             | 0.8009          | -32.6362       | -25.8478     | -2.8539         | -2.8542       |
| 0.2709        | 1.17  | 600  | 0.6234          | -1.7413        | -2.6395          | 0.6527             | 0.8982          | -33.8514       | -26.8684     | -2.8489         | -2.8493       |
| 0.4008        | 1.27  | 650  | 0.6173          | -1.8482        | -2.8001          | 0.6549             | 0.9519          | -34.1726       | -27.0823     | -2.8472         | -2.8476       |
| 0.2846        | 1.37  | 700  | 0.6222          | -1.8576        | -2.8175          | 0.6505             | 0.9599          | -34.2075       | -27.1011     | -2.8466         | -2.8470       |
| 0.2129        | 1.46  | 750  | 0.6233          | -1.8931        | -2.8716          | 0.6571             | 0.9785          | -34.3156       | -27.1720     | -2.8458         | -2.8462       |
| 0.3026        | 1.56  | 800  | 0.6224          | -1.9044        | -2.8881          | 0.6593             | 0.9837          | -34.3486       | -27.1947     | -2.8455         | -2.8458       |
| 0.3361        | 1.66  | 850  | 0.6242          | -1.9113        | -2.9007          | 0.6659             | 0.9894          | -34.3738       | -27.2085     | -2.8456         | -2.8460       |
| 0.2965        | 1.76  | 900  | 0.6223          | -1.9123        | -2.8982          | 0.6615             | 0.9859          | -34.3687       | -27.2103     | -2.8456         | -2.8460       |
| 0.2779        | 1.86  | 950  | 0.6213          | -1.9078        | -2.8977          | 0.6593             | 0.9900          | -34.3678       | -27.2013     | -2.8455         | -2.8459       |
| 0.2334        | 1.95  | 1000 | 0.6223          | -1.9087        | -2.8966          | 0.6593             | 0.9879          | -34.3656       | -27.2032     | -2.8455         | -2.8459       |


### Framework versions

- Transformers 4.38.2
- Pytorch 2.0.0+cu117
- Datasets 2.18.0
- Tokenizers 0.15.2