Edit model card

zephyr-dpop-qlora-uf-5e-6

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6778
  • Positive Losses: 0.2511
  • Dpo Losses: 0.6380
  • Rewards/chosen: 0.2338
  • Rewards/rejected: 0.1078
  • Rewards/accuracies: 0.7220
  • Rewards/margins: 0.1260
  • Rewards/margins Max: 0.4590
  • Rewards/margins Min: -0.1531
  • Rewards/margins Std: 0.2063
  • Logps/rejected: -247.8000
  • Logps/chosen: -261.2173
  • Logits/rejected: -2.6358
  • Logits/chosen: -2.6679

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Positive Losses Dpo Losses Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Rewards/margins Max Rewards/margins Min Rewards/margins Std Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6921 0.03 100 0.6915 0.0120 0.6901 0.0248 0.0186 0.6650 0.0062 0.0289 -0.0137 0.0139 -256.7177 -282.1105 -2.7667 -2.8058
0.6851 0.05 200 0.6926 0.0309 0.6807 0.0915 0.0656 0.6750 0.0259 0.1126 -0.0512 0.0538 -252.0206 -275.4478 -2.7565 -2.7961
0.6861 0.08 300 0.6918 0.0759 0.6716 0.1634 0.1174 0.6840 0.0460 0.1992 -0.0901 0.0953 -246.8362 -268.2485 -2.7386 -2.7784
0.7061 0.1 400 0.6930 0.1784 0.6638 0.1613 0.0976 0.6980 0.0637 0.2552 -0.1140 0.1221 -248.8232 -268.4671 -2.7244 -2.7626
0.6898 0.13 500 0.6790 0.0692 0.6643 0.1944 0.1318 0.6960 0.0626 0.2532 -0.1093 0.1204 -245.3965 -265.1498 -2.6738 -2.7125
0.6626 0.16 600 0.6882 0.1557 0.6581 0.1916 0.1146 0.7030 0.0771 0.3033 -0.1288 0.1436 -247.1206 -265.4292 -2.6704 -2.7063
0.6734 0.18 700 0.6858 0.1192 0.6579 0.1969 0.1194 0.7040 0.0775 0.3034 -0.1255 0.1428 -246.6380 -264.9039 -2.6266 -2.6663
0.6609 0.21 800 0.6883 0.1795 0.6530 0.1995 0.1104 0.7040 0.0891 0.3443 -0.1330 0.1590 -247.5411 -264.6414 -2.6689 -2.7102
0.6772 0.24 900 0.6839 0.1725 0.6531 0.2022 0.1130 0.6880 0.0892 0.3504 -0.1380 0.1632 -247.2793 -264.3728 -2.6511 -2.6915
0.6919 0.26 1000 0.6744 0.1283 0.6542 0.2115 0.1251 0.7010 0.0864 0.3385 -0.1313 0.1574 -246.0686 -263.4407 -2.6584 -2.6966
0.6999 0.29 1100 0.6819 0.2083 0.6484 0.2098 0.1097 0.7000 0.1001 0.3740 -0.1388 0.1721 -247.6088 -263.6143 -2.6762 -2.7107
0.6733 0.31 1200 0.6808 0.1924 0.6510 0.2160 0.1214 0.7030 0.0946 0.3760 -0.1424 0.1725 -246.4347 -262.9895 -2.6589 -2.6920
0.6956 0.34 1300 0.6718 0.1008 0.6534 0.2214 0.1328 0.7050 0.0887 0.3487 -0.1370 0.1630 -245.3008 -262.4492 -2.6513 -2.6859
0.7748 0.37 1400 0.6954 0.3217 0.6459 0.2119 0.1048 0.6950 0.1071 0.4142 -0.1578 0.1906 -248.1031 -263.4083 -2.6320 -2.6663
0.6702 0.39 1500 0.6791 0.1720 0.6498 0.2232 0.1257 0.6960 0.0974 0.3797 -0.1462 0.1757 -246.0048 -262.2763 -2.6179 -2.6541
0.7212 0.42 1600 0.6791 0.1329 0.6518 0.2243 0.1315 0.6950 0.0928 0.3671 -0.1422 0.1706 -245.4287 -262.1662 -2.6207 -2.6537
0.6612 0.44 1700 0.6769 0.2054 0.6477 0.2247 0.1221 0.7080 0.1026 0.3983 -0.1472 0.1822 -246.3665 -262.1213 -2.6438 -2.6771
0.6934 0.47 1800 0.6709 0.1501 0.6486 0.2306 0.1300 0.7040 0.1005 0.3907 -0.1460 0.1797 -245.5746 -261.5366 -2.6153 -2.6494
0.671 0.5 1900 0.6769 0.2101 0.6465 0.2250 0.1195 0.7030 0.1055 0.4051 -0.1482 0.1861 -246.6336 -262.0979 -2.5887 -2.6231
0.6552 0.52 2000 0.6781 0.2260 0.6439 0.2254 0.1140 0.7180 0.1115 0.4178 -0.1505 0.1902 -247.1805 -262.0490 -2.6150 -2.6499
0.6727 0.55 2100 0.6812 0.2672 0.6421 0.2229 0.1072 0.7220 0.1157 0.4343 -0.1502 0.1950 -247.8637 -262.3035 -2.6246 -2.6598
0.6657 0.58 2200 0.6809 0.2607 0.6417 0.2270 0.1102 0.7190 0.1168 0.4374 -0.1518 0.1964 -247.5590 -261.8957 -2.6197 -2.6535
0.7128 0.6 2300 0.6833 0.2781 0.6414 0.2262 0.1087 0.7240 0.1175 0.4382 -0.1512 0.1975 -247.7124 -261.9748 -2.6342 -2.6662
0.664 0.63 2400 0.6816 0.2634 0.6416 0.2271 0.1102 0.7180 0.1169 0.4368 -0.1508 0.1963 -247.5589 -261.8823 -2.6375 -2.6706
0.6854 0.65 2500 0.6814 0.2573 0.6404 0.2303 0.1104 0.7180 0.1200 0.4432 -0.1527 0.1993 -247.5439 -261.5588 -2.6317 -2.6642
0.6744 0.68 2600 0.6809 0.2731 0.6419 0.2299 0.1129 0.7160 0.1169 0.4482 -0.1567 0.2012 -247.2844 -261.6073 -2.6240 -2.6558
0.667 0.71 2700 0.6720 0.1811 0.6441 0.2364 0.1252 0.7130 0.1112 0.4252 -0.1508 0.1924 -246.0572 -260.9500 -2.6329 -2.6651
0.689 0.73 2800 0.6739 0.2081 0.6423 0.2358 0.1200 0.7080 0.1158 0.4364 -0.1553 0.1984 -246.5806 -261.0171 -2.6370 -2.6691
0.6882 0.76 2900 0.6874 0.3546 0.6369 0.2245 0.0957 0.7160 0.1289 0.4704 -0.1621 0.2122 -249.0114 -262.1393 -2.6382 -2.6701
0.6643 0.79 3000 0.6774 0.2362 0.6399 0.2337 0.1122 0.7160 0.1215 0.4493 -0.1538 0.2028 -247.3594 -261.2201 -2.6371 -2.6686
0.6877 0.81 3100 0.6720 0.1876 0.6414 0.2372 0.1196 0.7120 0.1176 0.4373 -0.1502 0.1979 -246.6224 -260.8720 -2.6330 -2.6651
0.6513 0.84 3200 0.6781 0.2526 0.6382 0.2320 0.1065 0.7200 0.1256 0.4574 -0.1549 0.2061 -247.9315 -261.3907 -2.6310 -2.6631
0.6681 0.86 3300 0.6757 0.2308 0.6389 0.2340 0.1102 0.7170 0.1238 0.4528 -0.1533 0.2041 -247.5555 -261.1891 -2.6348 -2.6670
0.6522 0.89 3400 0.6781 0.2483 0.6379 0.2331 0.1069 0.7190 0.1262 0.4590 -0.1536 0.2064 -247.8870 -261.2841 -2.6332 -2.6655
0.7096 0.92 3500 0.6798 0.2692 0.6372 0.2322 0.1044 0.7240 0.1278 0.4646 -0.1552 0.2086 -248.1408 -261.3742 -2.6354 -2.6675
0.6554 0.94 3600 0.6779 0.2514 0.6379 0.2336 0.1075 0.7200 0.1261 0.4599 -0.1530 0.2065 -247.8322 -261.2344 -2.6363 -2.6684
0.7134 0.97 3700 0.6779 0.2483 0.6379 0.2337 0.1076 0.7220 0.1261 0.4594 -0.1529 0.2064 -247.8183 -261.2257 -2.6360 -2.6680
0.6563 0.99 3800 0.6777 0.2476 0.6380 0.2338 0.1078 0.7240 0.1260 0.4592 -0.1531 0.2063 -247.7969 -261.2152 -2.6339 -2.6662

Framework versions

  • PEFT 0.7.1
  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for just1nseo/zephyr-dpop-qlora-uf-5e-6

Adapter
(137)
this model

Dataset used to train just1nseo/zephyr-dpop-qlora-uf-5e-6