Edit model card

zephyr-7b-dpop-uf6k-qlora-5e-6-epoch3

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the generation/UF6konly dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9050
  • Positive Losses: 1.9848
  • Dpo Losses: 0.6341
  • Rewards/chosen: 0.1274
  • Rewards/rejected: -0.0080
  • Rewards/accuracies: 0.7381
  • Rewards/margins: 0.1354
  • Rewards/margins Max: 0.3634
  • Rewards/margins Min: -0.0877
  • Rewards/margins Std: 0.2033
  • Logps/rejected: -259.9846
  • Logps/chosen: -272.4854
  • Logits/rejected: -2.7772
  • Logits/chosen: -2.8182

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 16
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Positive Losses Dpo Losses Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Rewards/margins Max Rewards/margins Min Rewards/margins Std Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6895 0.3 100 0.6866 0.0402 0.6797 0.0926 0.0647 0.6905 0.0279 0.0904 -0.0340 0.0553 -252.7092 -275.9622 -2.7972 -2.8420
0.682 0.61 200 0.6814 0.0885 0.6688 0.1594 0.1077 0.7183 0.0518 0.1531 -0.0501 0.0916 -248.4167 -269.2792 -2.8159 -2.8598
0.672 0.91 300 0.6821 0.1534 0.6642 0.1744 0.1117 0.7262 0.0626 0.1928 -0.0674 0.1166 -248.0109 -267.7851 -2.7544 -2.7992
0.6084 1.22 400 0.7278 0.5292 0.6533 0.1649 0.0771 0.7183 0.0878 0.2556 -0.0760 0.1485 -251.4729 -268.7308 -2.7666 -2.8121
0.6009 1.52 500 0.7518 0.7332 0.6465 0.1548 0.0515 0.7341 0.1033 0.2829 -0.0732 0.1598 -254.0323 -269.7391 -2.7510 -2.7945
0.613 1.82 600 0.7997 1.0438 0.6410 0.1367 0.0201 0.7619 0.1166 0.3122 -0.0805 0.1767 -257.1772 -271.5526 -2.7849 -2.8271
0.5504 2.13 700 0.8485 1.5090 0.6371 0.1361 0.0091 0.7540 0.1271 0.3381 -0.0856 0.1905 -258.2727 -271.6062 -2.7839 -2.8261
0.5924 2.43 800 0.9247 2.1422 0.6323 0.1214 -0.0182 0.7341 0.1396 0.3676 -0.0900 0.2062 -261.0059 -273.0837 -2.7895 -2.8309
0.5608 2.74 900 0.9056 1.9910 0.6338 0.1271 -0.0088 0.7460 0.1359 0.3635 -0.0870 0.2030 -260.0656 -272.5118 -2.7790 -2.8200

Framework versions

  • PEFT 0.7.1
  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for just1nseo/zephyr-7b-dpop-uf6k-qlora-5e-6-epoch3

Adapter
(137)
this model