---
library_name: transformers
license: apache-2.0
base_model: tsavage68/IE_M2_1000steps_1e7rate_SFT
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: IE_M2_1000steps_1e6rate_03beta_cSFTDPO
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# IE_M2_1000steps_1e6rate_03beta_cSFTDPO

This model is a fine-tuned version of [tsavage68/IE_M2_1000steps_1e7rate_SFT](https://huggingface.co./tsavage68/IE_M2_1000steps_1e7rate_SFT) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.3743
- Rewards/chosen: -0.5009
- Rewards/rejected: -8.2803
- Rewards/accuracies: 0.4600
- Rewards/margins: 7.7793
- Logps/rejected: -68.6227
- Logps/chosen: -43.8753
- Logits/rejected: -2.8766
- Logits/chosen: -2.8144

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.4505        | 0.4   | 50   | 0.3743          | -0.4606        | -7.3043          | 0.4600             | 6.8437          | -65.3694       | -43.7409     | -2.8804         | -2.8192       |
| 0.3812        | 0.8   | 100  | 0.3743          | -0.5073        | -7.6388          | 0.4600             | 7.1316          | -66.4846       | -43.8964     | -2.8732         | -2.8108       |
| 0.3119        | 1.2   | 150  | 0.3743          | -0.4846        | -7.8807          | 0.4600             | 7.3960          | -67.2907       | -43.8210     | -2.8767         | -2.8146       |
| 0.3639        | 1.6   | 200  | 0.3743          | -0.4919        | -7.9792          | 0.4600             | 7.4872          | -67.6190       | -43.8452     | -2.8768         | -2.8147       |
| 0.4332        | 2.0   | 250  | 0.3743          | -0.4951        | -8.0703          | 0.4600             | 7.5752          | -67.9228       | -43.8560     | -2.8769         | -2.8147       |
| 0.3986        | 2.4   | 300  | 0.3743          | -0.4967        | -8.1191          | 0.4600             | 7.6224          | -68.0855       | -43.8612     | -2.8768         | -2.8147       |
| 0.3986        | 2.8   | 350  | 0.3743          | -0.4916        | -8.1443          | 0.4600             | 7.6526          | -68.1694       | -43.8443     | -2.8768         | -2.8146       |
| 0.4505        | 3.2   | 400  | 0.3743          | -0.4891        | -8.2004          | 0.4600             | 7.7113          | -68.3565       | -43.8359     | -2.8768         | -2.8146       |
| 0.4505        | 3.6   | 450  | 0.3743          | -0.4982        | -8.2114          | 0.4600             | 7.7132          | -68.3931       | -43.8662     | -2.8766         | -2.8144       |
| 0.4332        | 4.0   | 500  | 0.3743          | -0.4973        | -8.2297          | 0.4600             | 7.7324          | -68.4541       | -43.8631     | -2.8766         | -2.8143       |
| 0.3292        | 4.4   | 550  | 0.3743          | -0.4993        | -8.2486          | 0.4600             | 7.7493          | -68.5172       | -43.8699     | -2.8765         | -2.8143       |
| 0.3639        | 4.8   | 600  | 0.3743          | -0.5006        | -8.2652          | 0.4600             | 7.7646          | -68.5726       | -43.8743     | -2.8767         | -2.8144       |
| 0.4505        | 5.2   | 650  | 0.3743          | -0.4997        | -8.2645          | 0.4600             | 7.7648          | -68.5701       | -43.8713     | -2.8765         | -2.8143       |
| 0.4505        | 5.6   | 700  | 0.3743          | -0.5034        | -8.2746          | 0.4600             | 7.7712          | -68.6037       | -43.8835     | -2.8765         | -2.8142       |
| 0.3639        | 6.0   | 750  | 0.3743          | -0.5002        | -8.2737          | 0.4600             | 7.7735          | -68.6009       | -43.8730     | -2.8765         | -2.8143       |
| 0.2426        | 6.4   | 800  | 0.3743          | -0.4991        | -8.2752          | 0.4600             | 7.7761          | -68.6059       | -43.8692     | -2.8768         | -2.8145       |
| 0.5025        | 6.8   | 850  | 0.3743          | -0.4985        | -8.2817          | 0.4600             | 7.7832          | -68.6276       | -43.8672     | -2.8766         | -2.8144       |
| 0.3119        | 7.2   | 900  | 0.3743          | -0.5001        | -8.2792          | 0.4600             | 7.7790          | -68.6191       | -43.8727     | -2.8765         | -2.8142       |
| 0.3466        | 7.6   | 950  | 0.3743          | -0.5010        | -8.2808          | 0.4600             | 7.7798          | -68.6245       | -43.8757     | -2.8766         | -2.8143       |
| 0.3812        | 8.0   | 1000 | 0.3743          | -0.5009        | -8.2803          | 0.4600             | 7.7793          | -68.6227       | -43.8753     | -2.8766         | -2.8144       |


### Framework versions

- Transformers 4.44.2
- Pytorch 2.0.0+cu117
- Datasets 3.0.0
- Tokenizers 0.19.1