--- license: apache-2.0 library_name: peft tags: - alignment-handbook - trl - dpo - generated_from_trainer base_model: alignment-handbook/zephyr-7b-sft-full datasets: - generation/UF6k model-index: - name: zephyr-7b-dpo-oursuf6k-qlora-5e-6 results: [] --- # zephyr-7b-dpo-oursuf6k-qlora-5e-6 This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co./alignment-handbook/zephyr-7b-sft-full) on the generation/UF6k dataset. It achieves the following results on the evaluation set: - Loss: 0.5972 - Rewards/chosen: -2.1562 - Rewards/rejected: -2.9617 - Rewards/accuracies: 0.6865 - Rewards/margins: 0.8055 - Rewards/margins Max: 2.4253 - Rewards/margins Min: -0.7592 - Rewards/margins Std: 1.4237 - Logps/rejected: -555.3531 - Logps/chosen: -500.8411 - Logits/rejected: -1.8009 - Logits/chosen: -1.8449 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-06 - train_batch_size: 2 - eval_batch_size: 4 - seed: 42 - distributed_type: multi-GPU - num_devices: 8 - total_train_batch_size: 16 - total_eval_batch_size: 32 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Rewards/margins Max | Rewards/margins Min | Rewards/margins Std | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:-------------------:|:-------------------:|:-------------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.5821 | 0.15 | 100 | 0.6622 | -0.1867 | -0.2755 | 0.6151 | 0.0888 | 0.3953 | -0.1795 | 0.2588 | -286.7321 | -303.8942 | -2.6966 | -2.7355 | | 0.481 | 0.29 | 200 | 0.6257 | -1.2575 | -1.6473 | 0.6746 | 0.3898 | 1.3412 | -0.5259 | 0.8283 | -423.9109 | -410.9716 | -2.5402 | -2.5661 | | 0.4017 | 0.44 | 300 | 0.6112 | -1.7680 | -2.5016 | 0.6944 | 0.7336 | 2.3329 | -0.8123 | 1.4011 | -509.3477 | -462.0217 | -1.9880 | -2.0224 | | 0.3427 | 0.58 | 400 | 0.5955 | -1.9140 | -2.6859 | 0.7024 | 0.7719 | 2.2721 | -0.7218 | 1.3401 | -527.7765 | -476.6219 | -1.9447 | -1.9863 | | 0.3246 | 0.73 | 500 | 0.6026 | -2.2815 | -3.0194 | 0.6627 | 0.7379 | 2.2879 | -0.7821 | 1.3716 | -561.1234 | -513.3748 | -1.8444 | -1.8864 | | 0.2747 | 0.88 | 600 | 0.5973 | -2.1734 | -2.9762 | 0.6786 | 0.8029 | 2.4273 | -0.7515 | 1.4233 | -556.8073 | -502.5607 | -1.7934 | -1.8380 | ### Framework versions - PEFT 0.7.1 - Transformers 4.39.0.dev0 - Pytorch 2.1.2+cu121 - Datasets 2.14.6 - Tokenizers 0.15.2