--- library_name: transformers license: apache-2.0 base_model: tsavage68/IE_M2_1000steps_1e7rate_SFT tags: - trl - dpo - generated_from_trainer model-index: - name: IE_M2_1000steps_1e7rate_03beta_SFT results: [] --- # IE_M2_1000steps_1e7rate_03beta_SFT This model is a fine-tuned version of [tsavage68/IE_M2_1000steps_1e7rate_SFT](https://huggingface.co./tsavage68/IE_M2_1000steps_1e7rate_SFT) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 0.3743 - Rewards/chosen: -0.4432 - Rewards/rejected: -6.7623 - Rewards/accuracies: 0.4600 - Rewards/margins: 6.3191 - Logps/rejected: -63.5627 - Logps/chosen: -43.6829 - Logits/rejected: -2.8851 - Logits/chosen: -2.8225 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-07 - train_batch_size: 2 - eval_batch_size: 1 - seed: 42 - gradient_accumulation_steps: 2 - total_train_batch_size: 4 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 100 - training_steps: 1000 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.4682 | 0.4 | 50 | 0.3782 | -0.1103 | -2.3818 | 0.4600 | 2.2716 | -48.9613 | -42.5731 | -2.9040 | -2.8424 | | 0.3812 | 0.8 | 100 | 0.3743 | -0.3057 | -5.2338 | 0.4600 | 4.9281 | -58.4679 | -43.2247 | -2.8913 | -2.8290 | | 0.3119 | 1.2 | 150 | 0.3743 | -0.4620 | -6.2918 | 0.4600 | 5.8298 | -61.9944 | -43.7454 | -2.8899 | -2.8276 | | 0.3639 | 1.6 | 200 | 0.3743 | -0.4045 | -6.1963 | 0.4600 | 5.7918 | -61.6762 | -43.5540 | -2.8874 | -2.8248 | | 0.4332 | 2.0 | 250 | 0.3743 | -0.4216 | -6.3719 | 0.4600 | 5.9503 | -62.2614 | -43.6108 | -2.8860 | -2.8234 | | 0.3986 | 2.4 | 300 | 0.3743 | -0.4257 | -6.4310 | 0.4600 | 6.0053 | -62.4585 | -43.6244 | -2.8858 | -2.8233 | | 0.3986 | 2.8 | 350 | 0.3743 | -0.4206 | -6.4901 | 0.4600 | 6.0695 | -62.6555 | -43.6075 | -2.8857 | -2.8232 | | 0.4505 | 3.2 | 400 | 0.3743 | -0.4331 | -6.5613 | 0.4600 | 6.1281 | -62.8927 | -43.6493 | -2.8859 | -2.8233 | | 0.4505 | 3.6 | 450 | 0.3743 | -0.4385 | -6.6329 | 0.4600 | 6.1945 | -63.1316 | -43.6671 | -2.8854 | -2.8229 | | 0.4332 | 4.0 | 500 | 0.3743 | -0.4451 | -6.6895 | 0.4600 | 6.2444 | -63.3203 | -43.6893 | -2.8853 | -2.8227 | | 0.3292 | 4.4 | 550 | 0.3743 | -0.4424 | -6.7191 | 0.4600 | 6.2766 | -63.4188 | -43.6803 | -2.8853 | -2.8227 | | 0.3639 | 4.8 | 600 | 0.3743 | -0.4424 | -6.7393 | 0.4600 | 6.2969 | -63.4861 | -43.6801 | -2.8854 | -2.8228 | | 0.4505 | 5.2 | 650 | 0.3743 | -0.4464 | -6.7495 | 0.4600 | 6.3031 | -63.5201 | -43.6934 | -2.8852 | -2.8225 | | 0.4505 | 5.6 | 700 | 0.3743 | -0.4436 | -6.7510 | 0.4600 | 6.3074 | -63.5251 | -43.6842 | -2.8853 | -2.8227 | | 0.3639 | 6.0 | 750 | 0.3743 | -0.4452 | -6.7582 | 0.4600 | 6.3130 | -63.5491 | -43.6895 | -2.8852 | -2.8225 | | 0.2426 | 6.4 | 800 | 0.3743 | -0.4492 | -6.7644 | 0.4600 | 6.3152 | -63.5699 | -43.7027 | -2.8854 | -2.8227 | | 0.5025 | 6.8 | 850 | 0.3743 | -0.4443 | -6.7593 | 0.4600 | 6.3150 | -63.5528 | -43.6864 | -2.8850 | -2.8224 | | 0.3119 | 7.2 | 900 | 0.3743 | -0.4434 | -6.7628 | 0.4600 | 6.3194 | -63.5646 | -43.6836 | -2.8853 | -2.8226 | | 0.3466 | 7.6 | 950 | 0.3743 | -0.4431 | -6.7625 | 0.4600 | 6.3194 | -63.5635 | -43.6825 | -2.8851 | -2.8225 | | 0.3812 | 8.0 | 1000 | 0.3743 | -0.4432 | -6.7623 | 0.4600 | 6.3191 | -63.5627 | -43.6829 | -2.8851 | -2.8225 | ### Framework versions - Transformers 4.44.2 - Pytorch 2.0.0+cu117 - Datasets 3.0.0 - Tokenizers 0.19.1