Edit model card

zephyr-7b-dpop-ours-qlora-5e-7-epoch3

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the generation/UF dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9525
  • Positive Losses: 2.5650
  • Dpo Losses: 0.6654
  • Rewards/chosen: 0.0615
  • Rewards/rejected: -0.0035
  • Rewards/accuracies: 0.6370
  • Rewards/margins: 0.0649
  • Rewards/margins Max: 0.3501
  • Rewards/margins Min: -0.1835
  • Rewards/margins Std: 0.1778
  • Logps/rejected: -258.9347
  • Logps/chosen: -278.4630
  • Logits/rejected: -2.6770
  • Logits/chosen: -2.7150

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Positive Losses Dpo Losses Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Rewards/margins Max Rewards/margins Min Rewards/margins Std Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6888 0.28 100 0.6930 0.0090 0.6919 0.0144 0.0119 0.5960 0.0025 0.0168 -0.0104 0.0091 -257.3992 -283.1658 -2.7677 -2.8066
0.6631 0.56 200 0.6984 0.1139 0.6859 0.0475 0.0323 0.5970 0.0152 0.0899 -0.0497 0.0463 -255.3604 -279.8582 -2.7518 -2.7902
0.6296 0.85 300 0.7188 0.3524 0.6802 0.0671 0.0392 0.5990 0.0279 0.1601 -0.0867 0.0826 -254.6683 -277.9036 -2.7287 -2.7668
0.6225 1.13 400 0.7561 0.7344 0.6753 0.0776 0.0381 0.6100 0.0395 0.2210 -0.1158 0.1128 -254.7784 -276.8472 -2.7128 -2.7504
0.5986 1.41 500 0.8408 1.5299 0.6717 0.0653 0.0164 0.6140 0.0488 0.2718 -0.1453 0.1394 -256.9439 -278.0837 -2.6920 -2.7297
0.6107 1.69 600 0.8630 1.7461 0.6689 0.0728 0.0171 0.6200 0.0557 0.3055 -0.1594 0.1554 -256.8792 -277.3334 -2.6848 -2.7225
0.5944 1.97 700 0.8998 2.0818 0.6674 0.0676 0.0079 0.625 0.0597 0.3249 -0.1697 0.1649 -257.8028 -277.8536 -2.6819 -2.7197
0.5619 2.25 800 0.9346 2.3977 0.6662 0.0630 0.0001 0.6300 0.0629 0.3402 -0.1784 0.1729 -258.5778 -278.3099 -2.6844 -2.7219
0.5725 2.54 900 0.9580 2.6145 0.6656 0.0590 -0.0056 0.6290 0.0646 0.3487 -0.1833 0.1774 -259.1476 -278.7048 -2.6818 -2.7195
0.5813 2.82 1000 0.9538 2.5739 0.6654 0.0612 -0.0038 0.6280 0.0651 0.3501 -0.1834 0.1778 -258.9730 -278.4868 -2.6794 -2.7173

Framework versions

  • PEFT 0.7.1
  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for just1nseo/zephyr-7b-dpop-ours-qlora-5e-7-epoch3

Adapter
(137)
this model