Mistral-7B-v0.3-gen-dpo-10k
This model is a fine-tuned version of mistralai/Mistral-7B-v0.3 on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.4487
- Rewards/real: 7.5104
- Rewards/generated: 1.3739
- Rewards/accuracies: 0.8462
- Rewards/margins: 6.1365
- Logps/generated: -218.1546
- Logps/real: -153.6829
- Logits/generated: -1.8523
- Logits/real: -2.6888
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/real | Rewards/generated | Rewards/accuracies | Rewards/margins | Logps/generated | Logps/real | Logits/generated | Logits/real |
---|---|---|---|---|---|---|---|---|---|---|---|
0.877 | 0.0992 | 31 | 0.8103 | 0.1264 | -0.1498 | 0.7308 | 0.2762 | -233.3916 | -227.5227 | -3.0962 | -3.1732 |
0.6795 | 0.1984 | 62 | 0.6395 | 0.2309 | -1.2902 | 0.8462 | 1.5211 | -244.7957 | -226.4777 | -2.7947 | -2.9556 |
0.5656 | 0.2976 | 93 | 0.5345 | -0.4078 | -2.8686 | 0.8846 | 2.4608 | -260.5800 | -232.8651 | -2.6125 | -2.8094 |
0.5392 | 0.3968 | 124 | 0.4632 | 0.7347 | -2.5555 | 0.9038 | 3.2903 | -257.4490 | -221.4394 | -2.6252 | -2.8182 |
0.519 | 0.496 | 155 | 0.4194 | 0.8203 | -2.7159 | 0.9038 | 3.5362 | -259.0524 | -220.5834 | -2.5450 | -2.7892 |
0.4522 | 0.5952 | 186 | 0.4109 | 1.1478 | -3.3424 | 0.9038 | 4.4902 | -265.3172 | -217.3086 | -2.1889 | -2.5745 |
0.4175 | 0.6944 | 217 | 0.4189 | 1.7292 | -3.5464 | 0.8846 | 5.2756 | -267.3578 | -211.4943 | -2.0150 | -2.4367 |
0.5123 | 0.7936 | 248 | 0.4031 | 1.5992 | -2.5850 | 0.9038 | 4.1842 | -257.7434 | -212.7944 | -2.3289 | -2.6748 |
0.4467 | 0.8928 | 279 | 0.4215 | 2.1259 | -3.1648 | 0.8654 | 5.2908 | -263.5421 | -207.5273 | -1.9457 | -2.5122 |
0.432 | 0.992 | 310 | 0.3889 | 2.4989 | -2.2218 | 0.9038 | 4.7207 | -254.1118 | -203.7978 | -2.1945 | -2.6616 |
0.206 | 1.0912 | 341 | 0.3944 | 3.9149 | -1.2192 | 0.8654 | 5.1341 | -244.0859 | -189.6380 | -2.1800 | -2.7888 |
0.1884 | 1.1904 | 372 | 0.3790 | 4.2792 | -1.2916 | 0.8846 | 5.5708 | -244.8093 | -185.9946 | -2.2022 | -2.8626 |
0.1866 | 1.2896 | 403 | 0.3799 | 4.7981 | -1.0761 | 0.8654 | 5.8742 | -242.6544 | -180.8058 | -2.1602 | -2.8470 |
0.195 | 1.3888 | 434 | 0.3898 | 5.3519 | -0.2792 | 0.8462 | 5.6311 | -234.6861 | -175.2681 | -2.2097 | -2.9184 |
0.1787 | 1.488 | 465 | 0.4027 | 5.4325 | -0.2501 | 0.8462 | 5.6826 | -234.3951 | -174.4621 | -2.3064 | -2.9594 |
0.1808 | 1.5872 | 496 | 0.3806 | 5.3354 | -0.7266 | 0.8654 | 6.0620 | -239.1595 | -175.4325 | -2.1412 | -2.9272 |
0.1629 | 1.6864 | 527 | 0.3708 | 5.4311 | -0.4097 | 0.8846 | 5.8408 | -235.9910 | -174.4760 | -2.3029 | -2.9067 |
0.1993 | 1.7856 | 558 | 0.3883 | 6.1042 | 0.4673 | 0.8654 | 5.6370 | -227.2212 | -167.7442 | -2.2351 | -2.9173 |
0.1687 | 1.8848 | 589 | 0.3744 | 5.8543 | -0.2070 | 0.8846 | 6.0613 | -233.9639 | -170.2437 | -2.1107 | -2.7838 |
0.1721 | 1.984 | 620 | 0.3694 | 5.9535 | -0.1297 | 0.8846 | 6.0832 | -233.1905 | -169.2515 | -2.1660 | -2.8452 |
0.1383 | 2.0832 | 651 | 0.3671 | 6.4065 | 0.2173 | 0.8846 | 6.1892 | -229.7208 | -164.7217 | -1.9638 | -2.7078 |
0.1365 | 2.1824 | 682 | 0.3923 | 7.0262 | 0.7925 | 0.8654 | 6.2337 | -223.9683 | -158.5246 | -1.8941 | -2.6932 |
0.1396 | 2.2816 | 713 | 0.4887 | 7.4973 | 2.3669 | 0.8077 | 5.1304 | -208.2246 | -153.8134 | -2.1548 | -2.9114 |
0.1397 | 2.3808 | 744 | 0.4148 | 7.3082 | 1.2357 | 0.8462 | 6.0725 | -219.5363 | -155.7047 | -1.9055 | -2.6924 |
0.1351 | 2.48 | 775 | 0.4137 | 7.3508 | 1.1950 | 0.8654 | 6.1558 | -219.9435 | -155.2784 | -1.8891 | -2.6999 |
0.14 | 2.5792 | 806 | 0.4429 | 7.5628 | 1.7969 | 0.8654 | 5.7659 | -213.9247 | -153.1584 | -1.9732 | -2.7988 |
0.1303 | 2.6784 | 837 | 0.4819 | 7.7271 | 2.2012 | 0.8654 | 5.5260 | -209.8819 | -151.5153 | -1.9800 | -2.8090 |
0.1329 | 2.7776 | 868 | 0.4405 | 7.4045 | 1.1559 | 0.8462 | 6.2486 | -220.3349 | -154.7421 | -1.8232 | -2.6413 |
0.1353 | 2.8768 | 899 | 0.4549 | 7.5125 | 1.3390 | 0.8462 | 6.1735 | -218.5042 | -153.6622 | -1.8288 | -2.6739 |
0.1319 | 2.976 | 930 | 0.4487 | 7.5104 | 1.3739 | 0.8462 | 6.1365 | -218.1546 | -153.6829 | -1.8523 | -2.6888 |
Framework versions
- Transformers 4.43.3
- Pytorch 2.2.2+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 12
Model tree for AmberYifan/Mistral-7B-v0.3-gen-dpo-10k
Base model
mistralai/Mistral-7B-v0.3