--- license: apache-2.0 base_model: mosaicml/mpt-7b-instruct tags: - trl - dpo - generated_from_trainer model-index: - name: MPT_1000_STEPS_1e5_rate_01_beta_DPO results: [] --- # MPT_1000_STEPS_1e5_rate_01_beta_DPO This model is a fine-tuned version of [mosaicml/mpt-7b-instruct](https://huggingface.co./mosaicml/mpt-7b-instruct) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 0.8946 - Rewards/chosen: -4.4962 - Rewards/rejected: -4.4462 - Rewards/accuracies: 0.4901 - Rewards/margins: -0.0501 - Logps/rejected: -66.0193 - Logps/chosen: -65.7547 - Logits/rejected: 8.4623 - Logits/chosen: 8.4615 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-05 - train_batch_size: 2 - eval_batch_size: 1 - seed: 42 - gradient_accumulation_steps: 2 - total_train_batch_size: 4 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 100 - training_steps: 1000 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.7056 | 0.05 | 50 | 0.9054 | -1.8795 | -1.8769 | 0.4857 | -0.0027 | -40.3261 | -39.5876 | 13.2447 | 13.2474 | | 1.3284 | 0.1 | 100 | 1.3365 | -5.2198 | -5.1996 | 0.4835 | -0.0202 | -73.5531 | -72.9898 | 40.0297 | 40.0297 | | 4.0395 | 0.15 | 150 | 1.2940 | -5.6920 | -5.6131 | 0.4637 | -0.0789 | -77.6884 | -77.7120 | 34.5576 | 34.5577 | | 1.1998 | 0.2 | 200 | 1.1437 | -4.4153 | -4.3103 | 0.4747 | -0.1050 | -64.6601 | -64.9452 | 14.5309 | 14.5309 | | 1.0001 | 0.24 | 250 | 1.3580 | -5.0983 | -5.0232 | 0.5033 | -0.0751 | -71.7890 | -71.7751 | 24.0739 | 24.0735 | | 1.1726 | 0.29 | 300 | 1.0394 | -4.1980 | -4.0831 | 0.4879 | -0.1149 | -62.3888 | -62.7721 | 16.4743 | 16.4742 | | 1.0955 | 0.34 | 350 | 1.0584 | -4.9210 | -4.7783 | 0.4747 | -0.1427 | -69.3404 | -70.0020 | 20.7178 | 20.7172 | | 1.2598 | 0.39 | 400 | 1.0408 | -3.8776 | -3.8210 | 0.4945 | -0.0566 | -59.7678 | -59.5681 | 17.0600 | 17.0587 | | 1.2403 | 0.44 | 450 | 0.9855 | -4.8112 | -4.6991 | 0.4747 | -0.1121 | -68.5488 | -68.9046 | 10.9237 | 10.9226 | | 1.2967 | 0.49 | 500 | 0.9814 | -4.7410 | -4.6563 | 0.4769 | -0.0846 | -68.1207 | -68.2017 | 15.1832 | 15.1825 | | 1.152 | 0.54 | 550 | 0.9258 | -4.6800 | -4.6273 | 0.4989 | -0.0527 | -67.8303 | -67.5925 | 9.7415 | 9.7409 | | 0.9473 | 0.59 | 600 | 0.9416 | -3.6301 | -3.6600 | 0.5341 | 0.0299 | -58.1573 | -57.0931 | 10.5794 | 10.5787 | | 0.9534 | 0.64 | 650 | 0.9361 | -4.7539 | -4.6806 | 0.4681 | -0.0733 | -68.3630 | -68.3308 | 11.2450 | 11.2442 | | 0.985 | 0.68 | 700 | 0.9194 | -4.5437 | -4.5232 | 0.5011 | -0.0205 | -66.7896 | -66.2292 | 9.1942 | 9.1934 | | 0.97 | 0.73 | 750 | 0.9090 | -4.6508 | -4.5989 | 0.4835 | -0.0520 | -67.5462 | -67.3006 | 8.0813 | 8.0806 | | 0.8148 | 0.78 | 800 | 0.8992 | -4.5695 | -4.5180 | 0.4923 | -0.0515 | -66.7373 | -66.4875 | 8.3458 | 8.3450 | | 0.9668 | 0.83 | 850 | 0.8976 | -4.5172 | -4.4650 | 0.4901 | -0.0521 | -66.2078 | -65.9638 | 8.2885 | 8.2877 | | 0.9438 | 0.88 | 900 | 0.8952 | -4.4950 | -4.4441 | 0.4923 | -0.0509 | -65.9988 | -65.7424 | 8.4833 | 8.4825 | | 1.0069 | 0.93 | 950 | 0.8954 | -4.4971 | -4.4461 | 0.4901 | -0.0510 | -66.0188 | -65.7634 | 8.4615 | 8.4607 | | 0.7377 | 0.98 | 1000 | 0.8946 | -4.4962 | -4.4462 | 0.4901 | -0.0501 | -66.0193 | -65.7547 | 8.4623 | 8.4615 | ### Framework versions - Transformers 4.39.1 - Pytorch 2.0.0+cu117 - Datasets 2.18.0 - Tokenizers 0.15.2