Edit model card

zephyr-dpo-qlora-uf-ours-uffull-5e-6

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the generation/UF and the generation/UFfull2 datasets. It achieves the following results on the evaluation set:

  • Loss: 0.4948
  • Rewards/chosen: -1.7888
  • Rewards/rejected: -2.8835
  • Rewards/accuracies: 0.7485
  • Rewards/margins: 1.0946
  • Rewards/margins Max: 3.5873
  • Rewards/margins Min: -0.9701
  • Rewards/margins Std: 1.5436
  • Logps/rejected: -554.2000
  • Logps/chosen: -463.3372
  • Logits/rejected: -1.5538
  • Logits/chosen: -1.6206

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Rewards/margins Max Rewards/margins Min Rewards/margins Std Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6903 0.02 100 0.6905 0.0096 0.0042 0.6635 0.0055 0.0279 -0.0135 0.0138 -265.4348 -283.4918 -2.7667 -2.8015
0.6668 0.05 200 0.6714 0.0249 -0.0232 0.6645 0.0481 0.2299 -0.1105 0.1130 -268.1768 -281.9665 -2.7343 -2.7676
0.6136 0.07 300 0.6388 -0.2723 -0.4201 0.6695 0.1478 0.6956 -0.3145 0.3388 -307.8617 -311.6826 -2.6777 -2.7086
0.6224 0.1 400 0.6072 -0.4408 -0.7266 0.6825 0.2858 1.2193 -0.5526 0.5951 -338.5125 -328.5356 -2.5218 -2.5541
0.5913 0.12 500 0.5700 -0.6299 -1.0928 0.6975 0.4629 1.7719 -0.6554 0.8141 -375.1356 -347.4472 -2.1793 -2.2226
0.5721 0.14 600 0.5595 -1.1081 -1.7353 0.7145 0.6271 2.2934 -0.8628 1.0597 -439.3786 -395.2698 -2.0549 -2.1036
0.4888 0.17 700 0.5546 -1.4460 -2.1425 0.7085 0.6965 2.5873 -0.9396 1.1811 -480.1024 -429.0589 -1.7782 -1.8362
0.4774 0.19 800 0.5258 -1.2110 -1.9801 0.7270 0.7691 2.5889 -0.8329 1.1591 -463.8646 -405.5573 -1.9074 -1.9645
0.521 0.22 900 0.5286 -1.4043 -2.2106 0.7355 0.8063 2.8030 -0.8890 1.2406 -486.9130 -424.8805 -1.5390 -1.5999
0.4871 0.24 1000 0.5354 -1.0617 -1.8924 0.7250 0.8307 2.9996 -0.8983 1.3137 -455.0902 -390.6243 -1.7795 -1.8273
0.5574 0.26 1100 0.5379 -1.2560 -2.0556 0.7205 0.7996 3.0463 -0.8879 1.3085 -471.4182 -410.0581 -1.6403 -1.6951
0.5017 0.29 1200 0.5261 -1.3320 -2.1724 0.7295 0.8404 2.9985 -0.8951 1.3031 -483.0894 -417.6535 -1.7025 -1.7570
0.4478 0.31 1300 0.5277 -1.7254 -2.6499 0.7230 0.9245 3.2834 -1.0237 1.4394 -530.8426 -456.9910 -1.7244 -1.7779
0.4919 0.34 1400 0.5189 -1.1742 -2.0426 0.7365 0.8684 3.0337 -0.9052 1.3302 -470.1158 -401.8751 -1.5533 -1.6223
0.4792 0.36 1500 0.5205 -1.3947 -2.3310 0.7340 0.9364 3.1265 -0.9863 1.3913 -498.9553 -423.9220 -1.6972 -1.7596
0.4952 0.38 1600 0.5316 -1.8397 -2.8176 0.7290 0.9779 3.2675 -1.0997 1.4769 -547.6121 -468.4282 -1.8293 -1.8827
0.5084 0.41 1700 0.5285 -2.4336 -3.4484 0.7295 1.0147 3.4046 -1.1112 1.5199 -610.6892 -527.8181 -1.5473 -1.6112
0.4676 0.43 1800 0.5162 -1.8360 -2.7043 0.7370 0.8683 2.8969 -0.9280 1.2953 -536.2840 -468.0518 -1.5045 -1.5680
0.4588 0.45 1900 0.5073 -1.5345 -2.4614 0.7435 0.9269 3.0227 -0.9141 1.3341 -511.9908 -437.9078 -1.3109 -1.3855
0.4826 0.48 2000 0.5104 -1.6277 -2.6050 0.7385 0.9773 3.2595 -0.9829 1.4282 -526.3553 -447.2241 -1.3208 -1.3956
0.4925 0.5 2100 0.5079 -1.6078 -2.5256 0.7355 0.9178 2.9879 -0.9518 1.3324 -518.4150 -445.2356 -1.5277 -1.5931
0.546 0.53 2200 0.5100 -1.7097 -2.6882 0.7370 0.9785 3.1492 -1.0011 1.4117 -534.6687 -455.4216 -1.4247 -1.4938
0.4958 0.55 2300 0.5047 -1.4824 -2.3935 0.7385 0.9111 2.9984 -0.8454 1.2951 -505.2043 -432.6925 -1.6758 -1.7328
0.4757 0.57 2400 0.5021 -1.6699 -2.6304 0.7380 0.9605 3.1590 -0.8924 1.3656 -528.8900 -451.4436 -1.4670 -1.5347
0.4539 0.6 2500 0.5025 -1.7424 -2.7890 0.7400 1.0466 3.4316 -1.0034 1.5001 -544.7556 -458.6970 -1.5551 -1.6231
0.4612 0.62 2600 0.4991 -1.7503 -2.8124 0.7415 1.0621 3.4721 -0.9695 1.5041 -547.0907 -459.4844 -1.4927 -1.5622
0.5267 0.65 2700 0.4989 -1.5988 -2.5869 0.7410 0.9881 3.2210 -0.9401 1.4114 -524.5454 -444.3344 -1.5476 -1.6161
0.4999 0.67 2800 0.4974 -1.6001 -2.5954 0.7470 0.9953 3.2272 -0.8964 1.3973 -525.3958 -444.4690 -1.5260 -1.5935
0.4589 0.69 2900 0.4977 -1.7829 -2.8625 0.7415 1.0796 3.5812 -0.9488 1.5304 -552.1008 -462.7464 -1.5484 -1.6154
0.4433 0.72 3000 0.4995 -1.7820 -2.8827 0.7395 1.1007 3.6468 -0.9945 1.5727 -554.1236 -462.6560 -1.5922 -1.6589
0.4908 0.74 3100 0.4970 -1.7323 -2.7993 0.7415 1.0669 3.5268 -0.9553 1.5148 -545.7810 -457.6894 -1.6165 -1.6807
0.4325 0.77 3200 0.4972 -1.3958 -2.4076 0.75 1.0117 3.3475 -0.9045 1.4383 -506.6104 -424.0385 -1.6999 -1.7600
0.4645 0.79 3300 0.4970 -1.7218 -2.8037 0.7485 1.0819 3.5295 -0.9807 1.5290 -546.2211 -456.6324 -1.5845 -1.6505
0.4612 0.81 3400 0.4980 -1.8787 -2.9919 0.7445 1.1132 3.6640 -1.0013 1.5776 -565.0459 -472.3241 -1.4980 -1.5678
0.4023 0.84 3500 0.4987 -2.0641 -3.1949 0.7410 1.1308 3.7331 -1.0134 1.6034 -585.3400 -490.8608 -1.4923 -1.5625
0.4564 0.86 3600 0.4952 -1.8890 -2.9834 0.7445 1.0943 3.5913 -0.9690 1.5435 -564.1885 -473.3587 -1.5268 -1.5955
0.4337 0.89 3700 0.4948 -1.7899 -2.8791 0.7480 1.0892 3.5650 -0.9671 1.5348 -553.7646 -463.4457 -1.5501 -1.6174
0.4687 0.91 3800 0.4949 -1.7971 -2.8908 0.7475 1.0937 3.5845 -0.9702 1.5427 -554.9319 -464.1627 -1.5573 -1.6238
0.4624 0.93 3900 0.4946 -1.7588 -2.8495 0.7480 1.0908 3.5789 -0.9633 1.5386 -550.8040 -460.3306 -1.5625 -1.6288
0.4744 0.96 4000 0.4948 -1.7812 -2.8753 0.7470 1.0941 3.5851 -0.9685 1.5428 -553.3815 -462.5721 -1.5573 -1.6239
0.4294 0.98 4100 0.4950 -1.7859 -2.8799 0.7480 1.0940 3.5863 -0.9706 1.5436 -553.8444 -463.0418 -1.5527 -1.6196

Framework versions

  • PEFT 0.7.1
  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for just1nseo/zephyr-dpo-qlora-uf-ours-uffull-5e-6

Adapter
(137)
this model