just1nseo's picture
End of training
1e12881 verified
metadata
base_model: alignment-handbook/zephyr-7b-sft-full
datasets:
  - generation/UF
  - generation/UFfull2
library_name: peft
license: apache-2.0
tags:
  - alignment-handbook
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: zephyr-dpop-qlora-uf-ours-uffull-5e-7
    results: []

zephyr-dpop-qlora-uf-ours-uffull-5e-7

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the generation/UF and the generation/UFfull2 datasets. It achieves the following results on the evaluation set:

  • Loss: 0.6824
  • Positive Losses: 0.1476
  • Dpo Losses: 0.6646
  • Rewards/chosen: 0.1662
  • Rewards/rejected: 0.1035
  • Rewards/accuracies: 0.6815
  • Rewards/margins: 0.0627
  • Rewards/margins Max: 0.2718
  • Rewards/margins Min: -0.1172
  • Rewards/margins Std: 0.1305
  • Logps/rejected: -255.5023
  • Logps/chosen: -267.8385
  • Logits/rejected: -2.7203
  • Logits/chosen: -2.7554

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Positive Losses Dpo Losses Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Rewards/margins Max Rewards/margins Min Rewards/margins Std Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.694 0.02 100 0.6937 0.0064 0.6931 0.0049 0.0049 0.5075 0.0001 0.0049 -0.0046 0.0032 -265.3661 -283.9625 -2.7648 -2.8001
0.6922 0.05 200 0.6930 0.0035 0.6926 0.0082 0.0071 0.5875 0.0011 0.0082 -0.0056 0.0046 -265.1425 -283.6357 -2.7650 -2.8002
0.692 0.07 300 0.6921 0.0052 0.6914 0.0190 0.0154 0.6175 0.0035 0.0195 -0.0103 0.0099 -264.3096 -282.5598 -2.7662 -2.8012
0.6914 0.1 400 0.6907 0.0081 0.6896 0.0324 0.0252 0.6435 0.0072 0.0364 -0.0176 0.0181 -263.3349 -281.2179 -2.7620 -2.7972
0.6867 0.12 500 0.6887 0.0124 0.6868 0.0581 0.0451 0.6360 0.0130 0.0654 -0.0313 0.0323 -261.3455 -278.6435 -2.7580 -2.7932
0.6903 0.14 600 0.6869 0.0213 0.6837 0.0696 0.0499 0.6565 0.0197 0.0949 -0.0434 0.0461 -260.8595 -277.4952 -2.7576 -2.7926
0.6828 0.17 700 0.6855 0.0302 0.6813 0.0840 0.0592 0.6595 0.0248 0.1199 -0.0539 0.0580 -259.9324 -276.0511 -2.7490 -2.7843
0.6758 0.19 800 0.6855 0.0526 0.6791 0.0969 0.0672 0.6550 0.0297 0.1423 -0.0640 0.0688 -259.1296 -274.7613 -2.7450 -2.7804
0.6811 0.22 900 0.6854 0.0594 0.6771 0.1064 0.0725 0.6645 0.0339 0.1596 -0.0715 0.0771 -258.6040 -273.8141 -2.7378 -2.7726
0.6803 0.24 1000 0.6845 0.0609 0.6762 0.1167 0.0807 0.6645 0.0360 0.1687 -0.0763 0.0818 -257.7856 -272.7885 -2.7285 -2.7634
0.6759 0.26 1100 0.6842 0.0676 0.6750 0.1250 0.0862 0.6610 0.0388 0.1815 -0.0829 0.0881 -257.2345 -271.9526 -2.7320 -2.7672
0.6732 0.29 1200 0.6896 0.1405 0.6722 0.1179 0.0727 0.6695 0.0452 0.2076 -0.0939 0.1005 -258.5845 -272.6641 -2.7315 -2.7664
0.6748 0.31 1300 0.6835 0.0876 0.6734 0.1391 0.0966 0.6665 0.0425 0.1965 -0.0897 0.0954 -256.1944 -270.5492 -2.7357 -2.7709
0.6872 0.34 1400 0.6834 0.0973 0.6721 0.1392 0.0939 0.6670 0.0453 0.2070 -0.0930 0.1000 -256.4647 -270.5385 -2.7367 -2.7719
0.6926 0.36 1500 0.6833 0.1058 0.6710 0.1402 0.0925 0.6685 0.0477 0.2165 -0.0956 0.1042 -256.6026 -270.4324 -2.7329 -2.7681
0.6862 0.38 1600 0.6891 0.1729 0.6689 0.1322 0.0796 0.6750 0.0526 0.2361 -0.1039 0.1134 -257.8935 -271.2309 -2.7292 -2.7642
0.6779 0.41 1700 0.6821 0.0962 0.6698 0.1486 0.0979 0.6705 0.0507 0.2293 -0.1016 0.1104 -256.0604 -269.5961 -2.7308 -2.7658
0.6726 0.43 1800 0.6842 0.1209 0.6687 0.1467 0.0934 0.6730 0.0533 0.2380 -0.1060 0.1149 -256.5087 -269.7857 -2.7266 -2.7615
0.6688 0.45 1900 0.6834 0.1202 0.6681 0.1483 0.0938 0.6745 0.0545 0.2410 -0.1065 0.1162 -256.4724 -269.6281 -2.7300 -2.7651
0.6616 0.48 2000 0.6818 0.1092 0.6681 0.1532 0.0987 0.6720 0.0545 0.2409 -0.1069 0.1164 -255.9825 -269.1367 -2.7336 -2.7687
0.6707 0.5 2100 0.6804 0.0930 0.6684 0.1588 0.1049 0.6710 0.0538 0.2405 -0.1069 0.1162 -255.3586 -268.5765 -2.7300 -2.7651
0.6796 0.53 2200 0.6849 0.1551 0.6666 0.1500 0.0920 0.6755 0.0580 0.2565 -0.1121 0.1234 -256.6537 -269.4551 -2.7228 -2.7582
0.6672 0.55 2300 0.6830 0.1404 0.6668 0.1562 0.0986 0.6725 0.0576 0.2557 -0.1114 0.1231 -255.9975 -268.8366 -2.7203 -2.7554
0.6769 0.57 2400 0.6819 0.1252 0.6668 0.1596 0.1019 0.6740 0.0577 0.2565 -0.1128 0.1238 -255.6599 -268.4941 -2.7159 -2.7508
0.6725 0.6 2500 0.6903 0.2239 0.6645 0.1488 0.0859 0.6850 0.0630 0.2751 -0.1201 0.1325 -257.2663 -269.5727 -2.7161 -2.7509
0.6762 0.62 2600 0.6834 0.1472 0.6655 0.1615 0.1008 0.6760 0.0606 0.2671 -0.1166 0.1287 -255.7709 -268.3081 -2.7154 -2.7503
0.6867 0.65 2700 0.6846 0.1619 0.6649 0.1605 0.0985 0.6820 0.0620 0.2708 -0.1178 0.1304 -256.0078 -268.4086 -2.7205 -2.7554
0.702 0.67 2800 0.6836 0.1510 0.6651 0.1623 0.1007 0.6815 0.0616 0.2697 -0.1175 0.1299 -255.7832 -268.2218 -2.7157 -2.7510
0.6822 0.69 2900 0.6818 0.1312 0.6653 0.1655 0.1045 0.6800 0.0610 0.2669 -0.1156 0.1282 -255.4075 -267.9095 -2.7201 -2.7554
0.6751 0.72 3000 0.6809 0.1235 0.6656 0.1674 0.1070 0.6745 0.0604 0.2651 -0.1144 0.1272 -255.1547 -267.7156 -2.7193 -2.7547
0.673 0.74 3100 0.6830 0.1523 0.6648 0.1643 0.1022 0.6815 0.0621 0.2709 -0.1168 0.1301 -255.6314 -268.0210 -2.7211 -2.7563
0.6666 0.77 3200 0.6818 0.1381 0.6653 0.1672 0.1062 0.6785 0.0611 0.2675 -0.1157 0.1284 -255.2344 -267.7304 -2.7202 -2.7554
0.6619 0.79 3300 0.6829 0.1523 0.6647 0.1652 0.1028 0.6810 0.0624 0.2717 -0.1172 0.1304 -255.5768 -267.9396 -2.7207 -2.7559
0.6752 0.81 3400 0.6830 0.1530 0.6647 0.1653 0.1029 0.6805 0.0625 0.2718 -0.1177 0.1306 -255.5670 -267.9222 -2.7197 -2.7548
0.6711 0.84 3500 0.6841 0.1663 0.6643 0.1634 0.1000 0.6795 0.0633 0.2740 -0.1183 0.1317 -255.8493 -268.1196 -2.7188 -2.7540
0.669 0.86 3600 0.6843 0.1689 0.6642 0.1628 0.0992 0.6815 0.0637 0.2755 -0.1190 0.1323 -255.9366 -268.1706 -2.7180 -2.7533
0.6563 0.89 3700 0.6835 0.1602 0.6643 0.1642 0.1009 0.6815 0.0633 0.2740 -0.1182 0.1316 -255.7627 -268.0358 -2.7189 -2.7540
0.6811 0.91 3800 0.6828 0.1517 0.6646 0.1658 0.1032 0.6820 0.0627 0.2721 -0.1176 0.1307 -255.5359 -267.8722 -2.7190 -2.7541
0.664 0.93 3900 0.6823 0.1453 0.6647 0.1664 0.1039 0.6780 0.0625 0.2717 -0.1171 0.1305 -255.4641 -267.8119 -2.7221 -2.7571
0.6771 0.96 4000 0.6824 0.1453 0.6647 0.1662 0.1037 0.6775 0.0625 0.2716 -0.1174 0.1304 -255.4852 -267.8388 -2.7216 -2.7566
0.6644 0.98 4100 0.6825 0.1480 0.6646 0.1662 0.1036 0.6810 0.0626 0.2720 -0.1174 0.1305 -255.4913 -267.8348 -2.7189 -2.7542

Framework versions

  • PEFT 0.7.1
  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2