---
library_name: transformers
license: apache-2.0
base_model: tsavage68/IE_M2_1000steps_1e7rate_SFT
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: IE_M2_1000steps_1e7rate_03beta_SFT
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# IE_M2_1000steps_1e7rate_03beta_SFT

This model is a fine-tuned version of [tsavage68/IE_M2_1000steps_1e7rate_SFT](https://huggingface.co./tsavage68/IE_M2_1000steps_1e7rate_SFT) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.3743
- Rewards/chosen: -0.4432
- Rewards/rejected: -6.7623
- Rewards/accuracies: 0.4600
- Rewards/margins: 6.3191
- Logps/rejected: -63.5627
- Logps/chosen: -43.6829
- Logits/rejected: -2.8851
- Logits/chosen: -2.8225

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-07
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.4682        | 0.4   | 50   | 0.3782          | -0.1103        | -2.3818          | 0.4600             | 2.2716          | -48.9613       | -42.5731     | -2.9040         | -2.8424       |
| 0.3812        | 0.8   | 100  | 0.3743          | -0.3057        | -5.2338          | 0.4600             | 4.9281          | -58.4679       | -43.2247     | -2.8913         | -2.8290       |
| 0.3119        | 1.2   | 150  | 0.3743          | -0.4620        | -6.2918          | 0.4600             | 5.8298          | -61.9944       | -43.7454     | -2.8899         | -2.8276       |
| 0.3639        | 1.6   | 200  | 0.3743          | -0.4045        | -6.1963          | 0.4600             | 5.7918          | -61.6762       | -43.5540     | -2.8874         | -2.8248       |
| 0.4332        | 2.0   | 250  | 0.3743          | -0.4216        | -6.3719          | 0.4600             | 5.9503          | -62.2614       | -43.6108     | -2.8860         | -2.8234       |
| 0.3986        | 2.4   | 300  | 0.3743          | -0.4257        | -6.4310          | 0.4600             | 6.0053          | -62.4585       | -43.6244     | -2.8858         | -2.8233       |
| 0.3986        | 2.8   | 350  | 0.3743          | -0.4206        | -6.4901          | 0.4600             | 6.0695          | -62.6555       | -43.6075     | -2.8857         | -2.8232       |
| 0.4505        | 3.2   | 400  | 0.3743          | -0.4331        | -6.5613          | 0.4600             | 6.1281          | -62.8927       | -43.6493     | -2.8859         | -2.8233       |
| 0.4505        | 3.6   | 450  | 0.3743          | -0.4385        | -6.6329          | 0.4600             | 6.1945          | -63.1316       | -43.6671     | -2.8854         | -2.8229       |
| 0.4332        | 4.0   | 500  | 0.3743          | -0.4451        | -6.6895          | 0.4600             | 6.2444          | -63.3203       | -43.6893     | -2.8853         | -2.8227       |
| 0.3292        | 4.4   | 550  | 0.3743          | -0.4424        | -6.7191          | 0.4600             | 6.2766          | -63.4188       | -43.6803     | -2.8853         | -2.8227       |
| 0.3639        | 4.8   | 600  | 0.3743          | -0.4424        | -6.7393          | 0.4600             | 6.2969          | -63.4861       | -43.6801     | -2.8854         | -2.8228       |
| 0.4505        | 5.2   | 650  | 0.3743          | -0.4464        | -6.7495          | 0.4600             | 6.3031          | -63.5201       | -43.6934     | -2.8852         | -2.8225       |
| 0.4505        | 5.6   | 700  | 0.3743          | -0.4436        | -6.7510          | 0.4600             | 6.3074          | -63.5251       | -43.6842     | -2.8853         | -2.8227       |
| 0.3639        | 6.0   | 750  | 0.3743          | -0.4452        | -6.7582          | 0.4600             | 6.3130          | -63.5491       | -43.6895     | -2.8852         | -2.8225       |
| 0.2426        | 6.4   | 800  | 0.3743          | -0.4492        | -6.7644          | 0.4600             | 6.3152          | -63.5699       | -43.7027     | -2.8854         | -2.8227       |
| 0.5025        | 6.8   | 850  | 0.3743          | -0.4443        | -6.7593          | 0.4600             | 6.3150          | -63.5528       | -43.6864     | -2.8850         | -2.8224       |
| 0.3119        | 7.2   | 900  | 0.3743          | -0.4434        | -6.7628          | 0.4600             | 6.3194          | -63.5646       | -43.6836     | -2.8853         | -2.8226       |
| 0.3466        | 7.6   | 950  | 0.3743          | -0.4431        | -6.7625          | 0.4600             | 6.3194          | -63.5635       | -43.6825     | -2.8851         | -2.8225       |
| 0.3812        | 8.0   | 1000 | 0.3743          | -0.4432        | -6.7623          | 0.4600             | 6.3191          | -63.5627       | -43.6829     | -2.8851         | -2.8225       |


### Framework versions

- Transformers 4.44.2
- Pytorch 2.0.0+cu117
- Datasets 3.0.0
- Tokenizers 0.19.1