just1nseo's picture
End of training
5a79702 verified
metadata
base_model: alignment-handbook/zephyr-7b-sft-full
datasets:
  - generation/UF
  - generation/UFfull2
library_name: peft
license: apache-2.0
tags:
  - alignment-handbook
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: zephyr-dpo-qlora-uf-ours-uffull-5e-7
    results: []

zephyr-dpo-qlora-uf-ours-uffull-5e-7

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the generation/UF and the generation/UFfull2 datasets. It achieves the following results on the evaluation set:

  • Loss: 0.5926
  • Rewards/chosen: -0.2031
  • Rewards/rejected: -0.5182
  • Rewards/accuracies: 0.7065
  • Rewards/margins: 0.3151
  • Rewards/margins Max: 1.1834
  • Rewards/margins Min: -0.5406
  • Rewards/margins Std: 0.5821
  • Logps/rejected: -317.6757
  • Logps/chosen: -304.7639
  • Logits/rejected: -2.5747
  • Logits/chosen: -2.6051

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Rewards/margins Max Rewards/margins Min Rewards/margins Std Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6931 0.02 100 0.6930 0.0002 -0.0000 0.5130 0.0002 0.0047 -0.0043 0.0030 -265.8543 -284.4388 -2.7682 -2.8031
0.692 0.05 200 0.6923 0.0017 0.0000 0.6220 0.0017 0.0099 -0.0057 0.0051 -265.8525 -284.2892 -2.7668 -2.8017
0.6903 0.07 300 0.6908 0.0067 0.0019 0.6520 0.0048 0.0253 -0.0125 0.0125 -265.6623 -283.7856 -2.7627 -2.7978
0.6888 0.1 400 0.6880 0.0104 -0.0004 0.6645 0.0108 0.0545 -0.0264 0.0268 -265.8943 -283.4167 -2.7573 -2.7924
0.6827 0.12 500 0.6834 0.0345 0.0138 0.6820 0.0207 0.0989 -0.0454 0.0479 -264.4715 -281.0052 -2.7529 -2.7877
0.6831 0.14 600 0.6776 0.0296 -0.0039 0.6910 0.0335 0.1552 -0.0696 0.0745 -266.2422 -281.4937 -2.7479 -2.7827
0.6652 0.17 700 0.6700 0.0086 -0.0427 0.6820 0.0513 0.2350 -0.1057 0.1128 -270.1202 -283.5948 -2.7382 -2.7726
0.6486 0.19 800 0.6615 -0.0198 -0.0921 0.6805 0.0723 0.3237 -0.1470 0.1565 -275.0622 -286.4378 -2.7367 -2.7702
0.6457 0.22 900 0.6531 -0.0599 -0.1549 0.6755 0.0950 0.4216 -0.1947 0.2059 -281.3418 -290.4436 -2.7168 -2.7500
0.6356 0.24 1000 0.6449 -0.0625 -0.1814 0.6785 0.1188 0.5225 -0.2486 0.2583 -283.9890 -290.7086 -2.7042 -2.7362
0.6465 0.26 1100 0.6378 -0.0291 -0.1702 0.6775 0.1411 0.6108 -0.2946 0.3031 -282.8690 -287.3659 -2.6982 -2.7301
0.6121 0.29 1200 0.6317 -0.0658 -0.2261 0.6780 0.1603 0.6847 -0.3354 0.3418 -288.4626 -291.0350 -2.6893 -2.7208
0.6113 0.31 1300 0.6287 -0.1819 -0.3556 0.6820 0.1737 0.7287 -0.3416 0.3621 -301.4144 -302.6470 -2.6941 -2.7251
0.6058 0.34 1400 0.6234 -0.1290 -0.3204 0.6775 0.1914 0.7908 -0.3943 0.3995 -297.8902 -297.3538 -2.6823 -2.7135
0.6169 0.36 1500 0.6194 -0.1244 -0.3286 0.6790 0.2042 0.8341 -0.4094 0.4197 -298.7180 -296.9003 -2.6648 -2.6957
0.5809 0.38 1600 0.6163 -0.1125 -0.3291 0.6800 0.2167 0.8823 -0.4243 0.4399 -298.7659 -295.7021 -2.6547 -2.6853
0.5979 0.41 1700 0.6161 -0.2126 -0.4403 0.6805 0.2276 0.9153 -0.4469 0.4624 -309.8821 -305.7201 -2.6466 -2.6773
0.6034 0.43 1800 0.6124 -0.1652 -0.4014 0.6805 0.2362 0.9410 -0.4507 0.4726 -305.9889 -300.9712 -2.6365 -2.6672
0.5983 0.45 1900 0.6144 -0.0531 -0.2743 0.6900 0.2212 0.8923 -0.3931 0.4327 -293.2797 -289.7628 -2.6389 -2.6689
0.5822 0.48 2000 0.6049 -0.1502 -0.4096 0.6885 0.2593 1.0070 -0.4697 0.4998 -306.8109 -299.4801 -2.6378 -2.6679
0.6013 0.5 2100 0.6034 -0.1787 -0.4453 0.6870 0.2666 1.0331 -0.4819 0.5137 -310.3860 -302.3300 -2.6289 -2.6593
0.6018 0.53 2200 0.6019 -0.1572 -0.4295 0.6925 0.2723 1.0473 -0.4896 0.5205 -308.8055 -300.1773 -2.6287 -2.6585
0.6121 0.55 2300 0.6010 -0.2434 -0.5217 0.6905 0.2783 1.0633 -0.4893 0.5289 -318.0273 -308.7991 -2.6178 -2.6476
0.5698 0.57 2400 0.5979 -0.1902 -0.4780 0.6920 0.2878 1.0879 -0.4939 0.5369 -313.6557 -303.4752 -2.6092 -2.6389
0.5656 0.6 2500 0.5992 -0.2708 -0.5597 0.6985 0.2889 1.0980 -0.5097 0.5454 -321.8217 -311.5382 -2.5991 -2.6291
0.5795 0.62 2600 0.5950 -0.2109 -0.5113 0.6950 0.3003 1.1206 -0.5079 0.5533 -316.9805 -305.5476 -2.5944 -2.6244
0.5909 0.65 2700 0.5945 -0.2006 -0.5044 0.6950 0.3038 1.1335 -0.5150 0.5598 -316.2979 -304.5152 -2.5934 -2.6235
0.6097 0.67 2800 0.5938 -0.2035 -0.5091 0.6975 0.3055 1.1391 -0.5171 0.5610 -316.7604 -304.8101 -2.5909 -2.6210
0.5776 0.69 2900 0.5929 -0.2142 -0.5232 0.7040 0.3091 1.1530 -0.5251 0.5673 -318.1778 -305.8716 -2.5874 -2.6177
0.575 0.72 3000 0.5948 -0.1848 -0.4886 0.6980 0.3039 1.1465 -0.5243 0.5647 -314.7165 -302.9333 -2.5861 -2.6165
0.5767 0.74 3100 0.5936 -0.1972 -0.5061 0.7010 0.3089 1.1551 -0.5276 0.5690 -316.4648 -304.1734 -2.5862 -2.6166
0.5642 0.77 3200 0.5937 -0.1943 -0.5034 0.7010 0.3091 1.1615 -0.5332 0.5726 -316.1906 -303.8846 -2.5867 -2.6170
0.5767 0.79 3300 0.5914 -0.2376 -0.5569 0.7050 0.3193 1.1828 -0.5330 0.5823 -321.5458 -308.2144 -2.5828 -2.6131
0.5685 0.81 3400 0.5914 -0.2246 -0.5434 0.7045 0.3188 1.1858 -0.5380 0.5834 -320.1958 -306.9150 -2.5800 -2.6103
0.5687 0.84 3500 0.5909 -0.2343 -0.5556 0.7045 0.3214 1.1905 -0.5370 0.5855 -321.4169 -307.8832 -2.5779 -2.6082
0.5598 0.86 3600 0.5924 -0.2063 -0.5212 0.7060 0.3150 1.1819 -0.5400 0.5817 -317.9754 -305.0805 -2.5781 -2.6084
0.5639 0.89 3700 0.5921 -0.2090 -0.5258 0.7055 0.3168 1.1849 -0.5399 0.5831 -318.4354 -305.3578 -2.5751 -2.6056
0.5931 0.91 3800 0.5930 -0.1985 -0.5119 0.7060 0.3134 1.1790 -0.5399 0.5802 -317.0424 -304.3084 -2.5778 -2.6081
0.5542 0.93 3900 0.5929 -0.1989 -0.5128 0.7060 0.3139 1.1807 -0.5398 0.5808 -317.1321 -304.3491 -2.5760 -2.6064
0.5713 0.96 4000 0.5926 -0.2022 -0.5175 0.7050 0.3153 1.1831 -0.5407 0.5823 -317.6028 -304.6741 -2.5743 -2.6048
0.5725 0.98 4100 0.5925 -0.2025 -0.5175 0.7060 0.3149 1.1833 -0.5415 0.5824 -317.5993 -304.7070 -2.5752 -2.6056

Framework versions

  • PEFT 0.7.1
  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2