--- license: other base_model: lewtun/gemma-7b-sft-full-ultrachat-v0 tags: - alignment-handbook - trl - dpo - generated_from_trainer - trl - dpo - generated_from_trainer datasets: - HuggingFaceH4/ultrafeedback_binarized model-index: - name: gemma-7b-dpo-full-ultrafeedback-beta-0.01 results: [] --- # gemma-7b-dpo-full-ultrafeedback-beta-0.01 This model is a fine-tuned version of [lewtun/gemma-7b-sft-full-ultrachat-v0](https://huggingface.co./lewtun/gemma-7b-sft-full-ultrachat-v0) on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set: - Loss: 0.4698 - Rewards/chosen: -1.0027 - Rewards/rejected: -2.3339 - Rewards/accuracies: 0.7698 - Rewards/margins: 1.3312 - Logps/rejected: -1118.8601 - Logps/chosen: -1006.0907 - Logits/rejected: 90.6424 - Logits/chosen: 105.6680 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-07 - train_batch_size: 2 - eval_batch_size: 4 - seed: 42 - distributed_type: multi-GPU - num_devices: 8 - gradient_accumulation_steps: 8 - total_train_batch_size: 128 - total_eval_batch_size: 32 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.552 | 0.21 | 100 | 0.5756 | -2.8657 | -3.5901 | 0.7460 | 0.7243 | -1244.4771 | -1192.3933 | 82.3244 | 96.5612 | | 0.501 | 0.42 | 200 | 0.4914 | -1.6427 | -2.6660 | 0.7817 | 1.0233 | -1152.0745 | -1070.0895 | 91.1202 | 105.1467 | | 0.4893 | 0.63 | 300 | 0.4810 | -1.6604 | -2.8398 | 0.7619 | 1.1794 | -1169.4480 | -1071.8550 | 87.4237 | 101.9799 | | 0.4759 | 0.84 | 400 | 0.4718 | -0.8508 | -2.1538 | 0.7817 | 1.3030 | -1100.8470 | -990.8950 | 89.1600 | 104.0108 | ### Framework versions - Transformers 4.39.0.dev0 - Pytorch 2.1.2+cu121 - Datasets 2.14.6 - Tokenizers 0.15.1