--- license: llama2 base_model: meta-llama/Llama-2-7b-hf tags: - generated_from_trainer model-index: - name: Llama-2-7b-dpo-10k results: [] --- # Llama-2-7b-dpo-10k This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co./meta-llama/Llama-2-7b-hf) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 0.7215 - Rewards/real: 5.3782 - Rewards/generated: 4.9113 - Rewards/accuracies: 0.6923 - Rewards/margins: 0.4668 - Logps/generated: -113.1980 - Logps/real: -125.7774 - Logits/generated: -1.1385 - Logits/real: -1.0466 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-07 - train_batch_size: 4 - eval_batch_size: 4 - seed: 42 - distributed_type: multi-GPU - num_devices: 4 - gradient_accumulation_steps: 2 - total_train_batch_size: 32 - total_eval_batch_size: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 3 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/real | Rewards/generated | Rewards/accuracies | Rewards/margins | Logps/generated | Logps/real | Logits/generated | Logits/real | |:-------------:|:------:|:----:|:---------------:|:------------:|:-----------------:|:------------------:|:---------------:|:---------------:|:----------:|:----------------:|:-----------:| | 0.8559 | 0.1984 | 62 | 0.8605 | 0.4128 | 0.4099 | 0.4808 | 0.0029 | -158.2126 | -175.4314 | -0.8219 | -0.6123 | | 0.7999 | 0.3968 | 124 | 0.8323 | 1.5863 | 1.5154 | 0.5192 | 0.0709 | -147.1573 | -163.6966 | -0.8057 | -0.6067 | | 0.7846 | 0.5952 | 186 | 0.7979 | 2.4470 | 2.3135 | 0.5577 | 0.1335 | -139.1767 | -155.0893 | -0.8686 | -0.6862 | | 0.7916 | 0.7936 | 248 | 0.7819 | 3.0117 | 2.8464 | 0.6346 | 0.1653 | -133.8475 | -149.4422 | -0.9049 | -0.7322 | | 0.7714 | 0.992 | 310 | 0.7630 | 3.4214 | 3.1941 | 0.6346 | 0.2273 | -130.3704 | -145.3455 | -0.9511 | -0.7905 | | 0.678 | 1.1904 | 372 | 0.7552 | 3.9523 | 3.6931 | 0.6538 | 0.2592 | -125.3802 | -140.0360 | -0.9800 | -0.8279 | | 0.6337 | 1.3888 | 434 | 0.7464 | 4.4541 | 4.1602 | 0.6346 | 0.2939 | -120.7093 | -135.0177 | -1.0279 | -0.8860 | | 0.6575 | 1.5872 | 496 | 0.7352 | 4.8501 | 4.4918 | 0.6538 | 0.3583 | -117.3935 | -131.0585 | -1.0562 | -0.9285 | | 0.6606 | 1.7856 | 558 | 0.7270 | 5.1119 | 4.7485 | 0.6538 | 0.3634 | -114.8267 | -128.4403 | -1.0969 | -0.9780 | | 0.6319 | 1.984 | 620 | 0.7260 | 5.2581 | 4.8563 | 0.6538 | 0.4018 | -113.7479 | -126.9782 | -1.0953 | -0.9815 | | 0.552 | 2.1824 | 682 | 0.7295 | 5.3469 | 4.9377 | 0.6731 | 0.4092 | -112.9344 | -126.0898 | -1.1133 | -1.0072 | | 0.5541 | 2.3808 | 744 | 0.7229 | 5.4093 | 4.9819 | 0.6923 | 0.4274 | -112.4924 | -125.4664 | -1.1322 | -1.0330 | | 0.5342 | 2.5792 | 806 | 0.7246 | 5.3967 | 4.9520 | 0.6923 | 0.4447 | -112.7909 | -125.5919 | -1.1353 | -1.0397 | | 0.5318 | 2.7776 | 868 | 0.7229 | 5.3656 | 4.9040 | 0.6731 | 0.4615 | -113.2710 | -125.9033 | -1.1367 | -1.0427 | | 0.5396 | 2.976 | 930 | 0.7215 | 5.3782 | 4.9113 | 0.6923 | 0.4668 | -113.1980 | -125.7774 | -1.1385 | -1.0466 | ### Framework versions - Transformers 4.43.3 - Pytorch 2.2.2+cu121 - Datasets 2.20.0 - Tokenizers 0.19.1