--- license: apache-2.0 library_name: peft tags: - trl - dpo - unsloth - generated_from_trainer base_model: unsloth/llama-3-8b-Instruct-bnb-4bit model-index: - name: dpo results: [] --- # dpo This model is a fine-tuned version of [unsloth/llama-3-8b-Instruct-bnb-4bit](https://huggingface.co./unsloth/llama-3-8b-Instruct-bnb-4bit) on the None dataset. It achieves the following results on the evaluation set: - Loss: 0.6257 - Rewards/chosen: 0.8141 - Rewards/rejected: 0.4945 - Rewards/accuracies: 0.6431 - Rewards/margins: 0.3196 - Logps/rejected: -229.7856 - Logps/chosen: -249.2073 - Logits/rejected: -0.6789 - Logits/chosen: -0.6135 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-05 - train_batch_size: 4 - eval_batch_size: 4 - seed: 0 - gradient_accumulation_steps: 8 - total_train_batch_size: 32 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 100 - training_steps: 750 - mixed_precision_training: Native AMP ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6904 | 0.0372 | 28 | 0.6811 | 0.2766 | 0.2476 | 0.5770 | 0.0290 | -232.2545 | -254.5816 | -0.5471 | -0.5010 | | 0.6591 | 0.0745 | 56 | 0.6623 | 0.9939 | 0.8694 | 0.5927 | 0.1245 | -226.0365 | -247.4085 | -0.5351 | -0.4798 | | 0.6297 | 0.1117 | 84 | 0.6542 | 1.1966 | 0.9862 | 0.6136 | 0.2104 | -224.8689 | -245.3818 | -0.4689 | -0.4120 | | 0.5985 | 0.1489 | 112 | 0.6540 | 1.5211 | 1.2525 | 0.6087 | 0.2687 | -222.2059 | -242.1367 | -0.4989 | -0.4262 | | 0.6603 | 0.1862 | 140 | 0.6459 | 0.7737 | 0.5130 | 0.6304 | 0.2607 | -229.6009 | -249.6110 | -0.5779 | -0.5054 | | 0.619 | 0.2234 | 168 | 0.6411 | 0.9352 | 0.6917 | 0.6222 | 0.2435 | -227.8137 | -247.9963 | -0.5842 | -0.5261 | | 0.6497 | 0.2606 | 196 | 0.6427 | 0.8696 | 0.6404 | 0.6282 | 0.2292 | -228.3268 | -248.6518 | -0.5798 | -0.5255 | | 0.6014 | 0.2979 | 224 | 0.6397 | 0.8941 | 0.6357 | 0.6263 | 0.2583 | -228.3730 | -248.4069 | -0.6397 | -0.5816 | | 0.594 | 0.3351 | 252 | 0.6361 | 0.7069 | 0.4027 | 0.6319 | 0.3043 | -230.7038 | -250.2785 | -0.6434 | -0.5848 | | 0.5898 | 0.3723 | 280 | 0.6356 | 1.0373 | 0.7462 | 0.6278 | 0.2911 | -227.2686 | -246.9745 | -0.6340 | -0.5714 | | 0.639 | 0.4096 | 308 | 0.6342 | 0.7199 | 0.4321 | 0.6342 | 0.2878 | -230.4095 | -250.1490 | -0.6956 | -0.6293 | | 0.6289 | 0.4468 | 336 | 0.6363 | 0.4299 | 0.1879 | 0.6248 | 0.2420 | -232.8515 | -253.0488 | -0.6705 | -0.6155 | | 0.6304 | 0.4840 | 364 | 0.6321 | 0.7719 | 0.5053 | 0.6435 | 0.2667 | -229.6779 | -249.6284 | -0.6279 | -0.5652 | | 0.6126 | 0.5213 | 392 | 0.6325 | 0.5194 | 0.2033 | 0.6375 | 0.3161 | -232.6973 | -252.1539 | -0.6785 | -0.6117 | | 0.5974 | 0.5585 | 420 | 0.6254 | 0.7418 | 0.4269 | 0.6428 | 0.3149 | -230.4618 | -249.9303 | -0.6823 | -0.6170 | | 0.6185 | 0.5957 | 448 | 0.6267 | 0.9534 | 0.6106 | 0.6409 | 0.3428 | -228.6247 | -247.8141 | -0.6532 | -0.5866 | | 0.604 | 0.6330 | 476 | 0.6284 | 0.8011 | 0.4691 | 0.6394 | 0.3320 | -230.0398 | -249.3374 | -0.6842 | -0.6177 | | 0.6154 | 0.6702 | 504 | 0.6269 | 0.8353 | 0.5307 | 0.6431 | 0.3046 | -229.4234 | -248.9947 | -0.6705 | -0.6051 | | 0.5936 | 0.7074 | 532 | 0.6277 | 0.7287 | 0.4206 | 0.6469 | 0.3082 | -230.5248 | -250.0604 | -0.6887 | -0.6226 | | 0.6291 | 0.7447 | 560 | 0.6260 | 0.8539 | 0.5327 | 0.6439 | 0.3211 | -229.4030 | -248.8091 | -0.6758 | -0.6096 | | 0.6169 | 0.7819 | 588 | 0.6255 | 0.8797 | 0.5669 | 0.6461 | 0.3127 | -229.0613 | -248.5513 | -0.6690 | -0.6041 | | 0.5934 | 0.8191 | 616 | 0.6256 | 0.8582 | 0.5399 | 0.6461 | 0.3183 | -229.3312 | -248.7658 | -0.6753 | -0.6095 | | 0.6004 | 0.8564 | 644 | 0.6257 | 0.8263 | 0.5074 | 0.6450 | 0.3189 | -229.6564 | -249.0845 | -0.6761 | -0.6110 | | 0.6282 | 0.8936 | 672 | 0.6256 | 0.8133 | 0.4949 | 0.6442 | 0.3184 | -229.7819 | -249.2152 | -0.6748 | -0.6101 | | 0.5572 | 0.9309 | 700 | 0.6258 | 0.8122 | 0.4938 | 0.6442 | 0.3184 | -229.7925 | -249.2255 | -0.6781 | -0.6129 | | 0.595 | 0.9681 | 728 | 0.6256 | 0.8140 | 0.4943 | 0.6428 | 0.3197 | -229.7873 | -249.2078 | -0.6788 | -0.6134 | ### Framework versions - PEFT 0.11.1 - Transformers 4.41.2 - Pytorch 2.3.0+cu121 - Datasets 2.19.2 - Tokenizers 0.19.1