--- library_name: transformers tags: - trl - dpo - generated_from_trainer model-index: - name: OpenELM-1_1B-DPO-full-least-similar results: [] --- # OpenELM-1_1B-DPO-full-least-similar This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 201.4325 - Rewards/chosen: -684.0 - Rewards/rejected: -592.0 - Rewards/accuracies: 0.4277 - Rewards/margins: -93.0 - Logps/rejected: -59392.0 - Logps/chosen: -68608.0 - Logits/rejected: 5.5625 - Logits/chosen: 5.1562 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-05 - train_batch_size: 8 - eval_batch_size: 16 - seed: 42 - distributed_type: multi-GPU - num_devices: 4 - gradient_accumulation_steps: 2 - total_train_batch_size: 64 - total_eval_batch_size: 64 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 3 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6914 | 0.1047 | 100 | 0.6991 | -0.3457 | -0.3398 | 0.4199 | -0.0053 | -322.0 | -352.0 | -9.4375 | -9.8125 | | 0.6914 | 0.2094 | 200 | 17.0172 | -58.5 | -51.25 | 0.4238 | -7.3125 | -5408.0 | -6176.0 | 5.25 | 4.5312 | | 0.6914 | 0.3141 | 300 | 97.3562 | -330.0 | -286.0 | 0.4316 | -44.75 | -28800.0 | -33280.0 | -0.9297 | -0.9531 | | 0.6914 | 0.4188 | 400 | 103.5919 | -352.0 | -304.0 | 0.4316 | -48.0 | -30592.0 | -35328.0 | 0.4180 | 0.2930 | | 0.6914 | 0.5236 | 500 | 109.8674 | -372.0 | -322.0 | 0.4336 | -50.75 | -32512.0 | -37632.0 | 1.4766 | 1.3047 | | 0.6914 | 0.6283 | 600 | 116.7363 | -396.0 | -342.0 | 0.4316 | -54.0 | -34560.0 | -39936.0 | 1.8828 | 1.6641 | | 0.6914 | 0.7330 | 700 | 123.6395 | -420.0 | -362.0 | 0.4336 | -57.25 | -36608.0 | -42240.0 | 3.1406 | 2.8438 | | 0.6914 | 0.8377 | 800 | 130.5069 | -442.0 | -382.0 | 0.4316 | -60.25 | -38400.0 | -44544.0 | 4.1562 | 3.7656 | | 0.6914 | 0.9424 | 900 | 137.3969 | -466.0 | -402.0 | 0.4277 | -63.5 | -40448.0 | -46848.0 | 4.25 | 3.8594 | | 0.6914 | 1.0471 | 1000 | 143.7038 | -488.0 | -422.0 | 0.4297 | -66.5 | -42496.0 | -49152.0 | 5.6875 | 5.1562 | | 0.6914 | 1.1518 | 1100 | 150.1531 | -510.0 | -440.0 | 0.4297 | -69.5 | -44288.0 | -51200.0 | 6.5938 | 6.0 | | 0.6914 | 1.2565 | 1200 | 156.7057 | -532.0 | -460.0 | 0.4297 | -72.5 | -46336.0 | -53504.0 | 5.6875 | 5.1562 | | 0.6914 | 1.3613 | 1300 | 162.7056 | -552.0 | -476.0 | 0.4336 | -75.0 | -48128.0 | -55552.0 | 5.5938 | 5.1562 | | 0.6914 | 1.4660 | 1400 | 168.4744 | -572.0 | -494.0 | 0.4316 | -77.5 | -49664.0 | -57600.0 | 5.8438 | 5.3438 | | 0.6914 | 1.5707 | 1500 | 173.8489 | -588.0 | -510.0 | 0.4297 | -80.0 | -51200.0 | -59392.0 | 6.0312 | 5.5312 | | 0.6914 | 1.6754 | 1600 | 178.6226 | -608.0 | -524.0 | 0.4336 | -82.5 | -52736.0 | -60928.0 | 5.8438 | 5.375 | | 0.6914 | 1.7801 | 1700 | 183.2413 | -620.0 | -536.0 | 0.4297 | -84.5 | -54016.0 | -62464.0 | 5.7812 | 5.3125 | | 0.6914 | 1.8848 | 1800 | 186.8875 | -636.0 | -548.0 | 0.4277 | -86.0 | -55040.0 | -64000.0 | 5.5938 | 5.1562 | | 0.6914 | 1.9895 | 1900 | 190.4393 | -648.0 | -560.0 | 0.4316 | -87.5 | -56064.0 | -65024.0 | 5.8125 | 5.375 | | 0.6914 | 2.0942 | 2000 | 193.2805 | -656.0 | -568.0 | 0.4297 | -89.0 | -57088.0 | -66048.0 | 5.5312 | 5.125 | | 0.6914 | 2.1990 | 2100 | 195.6470 | -664.0 | -576.0 | 0.4277 | -90.5 | -57600.0 | -66560.0 | 5.4688 | 5.0625 | | 0.6914 | 2.3037 | 2200 | 197.7068 | -672.0 | -580.0 | 0.4238 | -91.0 | -58368.0 | -67584.0 | 5.4688 | 5.0625 | | 0.6914 | 2.4084 | 2300 | 199.1925 | -676.0 | -584.0 | 0.4238 | -92.0 | -58880.0 | -68096.0 | 5.5 | 5.125 | | 0.6914 | 2.5131 | 2400 | 200.0977 | -680.0 | -588.0 | 0.4258 | -92.5 | -59136.0 | -68096.0 | 5.5312 | 5.125 | | 0.6914 | 2.6178 | 2500 | 200.9000 | -684.0 | -588.0 | 0.4277 | -92.5 | -59392.0 | -68608.0 | 5.5625 | 5.1562 | | 0.6914 | 2.7225 | 2600 | 201.1795 | -684.0 | -592.0 | 0.4277 | -92.5 | -59392.0 | -68608.0 | 5.5938 | 5.1875 | | 0.6914 | 2.8272 | 2700 | 201.3105 | -684.0 | -592.0 | 0.4277 | -93.0 | -59392.0 | -68608.0 | 5.5938 | 5.1875 | | 0.6914 | 2.9319 | 2800 | 201.4325 | -684.0 | -592.0 | 0.4277 | -93.0 | -59392.0 | -68608.0 | 5.5625 | 5.1562 | ### Framework versions - Transformers 4.44.2 - Pytorch 2.3.0 - Datasets 2.21.0 - Tokenizers 0.19.1