---
license: mit
base_model: openai-community/gpt2
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: gpt2-dpo-from_base_gpt2
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# gpt2-dpo-from_base_gpt2

This model is a fine-tuned version of [openai-community/gpt2](https://huggingface.co./openai-community/gpt2) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 0.6406
- Rewards/chosen: 1.1312
- Rewards/rejected: 0.9208
- Rewards/accuracies: 0.6373
- Rewards/margins: 0.2103
- Logps/rejected: -429.5498
- Logps/chosen: -508.5024
- Logits/rejected: -96.1598
- Logits/chosen: -94.9073

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-07
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.2
- num_epochs: 10

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6679        | 0.9993 | 668  | 0.6728          | 0.2747         | 0.2209           | 0.625              | 0.0538          | -436.5490      | -517.0669    | -96.0258        | -94.8005      |
| 0.6697        | 2.0    | 1337 | 0.6545          | 0.6507         | 0.5283           | 0.6295             | 0.1224          | -433.4745      | -513.3065    | -96.0560        | -94.8147      |
| 0.6516        | 2.9993 | 2005 | 0.6467          | 0.8424         | 0.6867           | 0.6336             | 0.1557          | -431.8912      | -511.3903    | -96.1361        | -94.8919      |
| 0.6264        | 4.0    | 2674 | 0.6436          | 0.9803         | 0.7989           | 0.6336             | 0.1814          | -430.7686      | -510.0109    | -96.1278        | -94.8762      |
| 0.6114        | 4.9993 | 3342 | 0.6420          | 1.0453         | 0.8518           | 0.6377             | 0.1935          | -430.2403      | -509.3612    | -96.1435        | -94.8917      |
| 0.6016        | 6.0    | 4011 | 0.6412          | 1.0870         | 0.8859           | 0.6377             | 0.2011          | -429.8991      | -508.9442    | -96.1471        | -94.8941      |
| 0.6115        | 6.9993 | 4679 | 0.6408          | 1.1137         | 0.9071           | 0.6384             | 0.2066          | -429.6871      | -508.6768    | -96.1587        | -94.9064      |
| 0.6079        | 8.0    | 5348 | 0.6406          | 1.1274         | 0.9178           | 0.6388             | 0.2096          | -429.5802      | -508.5403    | -96.1573        | -94.9046      |
| 0.6066        | 8.9993 | 6016 | 0.6406          | 1.1310         | 0.9207           | 0.6373             | 0.2103          | -429.5507      | -508.5036    | -96.1593        | -94.9068      |
| 0.5968        | 9.9925 | 6680 | 0.6406          | 1.1312         | 0.9208           | 0.6373             | 0.2103          | -429.5498      | -508.5024    | -96.1598        | -94.9073      |


### Framework versions

- Transformers 4.40.2
- Pytorch 2.1.0+cu118
- Datasets 2.19.1
- Tokenizers 0.19.1