Edit model card

zephyr-dpo-qlora-gpt4-5e-7-epoch3

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the generation/GPT4 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6603
  • Rewards/chosen: -0.3016
  • Rewards/rejected: -0.3998
  • Rewards/accuracies: 0.5992
  • Rewards/margins: 0.0982
  • Rewards/margins Max: 0.5219
  • Rewards/margins Min: -0.3348
  • Rewards/margins Std: 0.3823
  • Logps/rejected: -299.1642
  • Logps/chosen: -315.3784
  • Logits/rejected: -2.6357
  • Logits/chosen: -2.6728

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 16
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Rewards/margins Max Rewards/margins Min Rewards/margins Std Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6815 0.28 100 0.6918 -0.0019 -0.0055 0.5516 0.0037 0.0181 -0.0087 0.0120 -259.7351 -285.4075 -2.8079 -2.8531
0.6235 0.56 200 0.6873 -0.0383 -0.0542 0.5873 0.0160 0.0859 -0.0499 0.0601 -264.6065 -289.0478 -2.7712 -2.8159
0.5521 0.85 300 0.6808 -0.1327 -0.1683 0.5952 0.0356 0.1823 -0.1064 0.1266 -276.0095 -298.4897 -2.7261 -2.7701
0.4853 1.13 400 0.6749 -0.2053 -0.2614 0.6032 0.0561 0.2952 -0.1704 0.2056 -285.3263 -305.7520 -2.6873 -2.7295
0.4561 1.41 500 0.6651 -0.1807 -0.2628 0.5913 0.0821 0.4091 -0.2388 0.2874 -285.4612 -303.2937 -2.6622 -2.7037
0.4337 1.69 600 0.6630 -0.2648 -0.3479 0.6111 0.0831 0.4556 -0.2917 0.3299 -293.9761 -311.7008 -2.6522 -2.6912
0.4052 1.97 700 0.6606 -0.2499 -0.3494 0.6151 0.0995 0.5023 -0.3041 0.3604 -294.1273 -310.2143 -2.6437 -2.6819
0.3797 2.25 800 0.6601 -0.2711 -0.3716 0.6151 0.1005 0.5194 -0.3194 0.3750 -296.3420 -312.3301 -2.6373 -2.6750
0.3692 2.54 900 0.6601 -0.2914 -0.3911 0.6032 0.0997 0.5207 -0.3303 0.3804 -298.2907 -314.3626 -2.6357 -2.6730
0.3953 2.82 1000 0.6607 -0.3036 -0.4008 0.6032 0.0972 0.5193 -0.3338 0.3807 -299.2639 -315.5808 -2.6356 -2.6727

Framework versions

  • PEFT 0.7.1
  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
1
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for just1nseo/zephyr-dpo-qlora-gpt4-5e-7-epoch3

Adapter
(137)
this model