Luca-Engel's picture
do test run on scitas with ref_model
ba5371c verified
metadata
license: mit
base_model: openai-community/gpt2
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: gpt2-dpo-from_base_gpt2
    results: []

gpt2-dpo-from_base_gpt2

This model is a fine-tuned version of openai-community/gpt2 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6406
  • Rewards/chosen: 1.1312
  • Rewards/rejected: 0.9208
  • Rewards/accuracies: 0.6373
  • Rewards/margins: 0.2103
  • Logps/rejected: -429.5498
  • Logps/chosen: -508.5024
  • Logits/rejected: -96.1598
  • Logits/chosen: -94.9073

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.2
  • num_epochs: 10

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6679 0.9993 668 0.6728 0.2747 0.2209 0.625 0.0538 -436.5490 -517.0669 -96.0258 -94.8005
0.6697 2.0 1337 0.6545 0.6507 0.5283 0.6295 0.1224 -433.4745 -513.3065 -96.0560 -94.8147
0.6516 2.9993 2005 0.6467 0.8424 0.6867 0.6336 0.1557 -431.8912 -511.3903 -96.1361 -94.8919
0.6264 4.0 2674 0.6436 0.9803 0.7989 0.6336 0.1814 -430.7686 -510.0109 -96.1278 -94.8762
0.6114 4.9993 3342 0.6420 1.0453 0.8518 0.6377 0.1935 -430.2403 -509.3612 -96.1435 -94.8917
0.6016 6.0 4011 0.6412 1.0870 0.8859 0.6377 0.2011 -429.8991 -508.9442 -96.1471 -94.8941
0.6115 6.9993 4679 0.6408 1.1137 0.9071 0.6384 0.2066 -429.6871 -508.6768 -96.1587 -94.9064
0.6079 8.0 5348 0.6406 1.1274 0.9178 0.6388 0.2096 -429.5802 -508.5403 -96.1573 -94.9046
0.6066 8.9993 6016 0.6406 1.1310 0.9207 0.6373 0.2103 -429.5507 -508.5036 -96.1593 -94.9068
0.5968 9.9925 6680 0.6406 1.1312 0.9208 0.6373 0.2103 -429.5498 -508.5024 -96.1598 -94.9073

Framework versions

  • Transformers 4.40.2
  • Pytorch 2.1.0+cu118
  • Datasets 2.19.1
  • Tokenizers 0.19.1