Mistral-7B-v0.3-dpo-10k
This model is a fine-tuned version of mistralai/Mistral-7B-v0.3 on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.4922
- Rewards/real: -4.5129
- Rewards/generated: -4.6699
- Rewards/accuracies: 0.4423
- Rewards/margins: 0.1570
- Logps/generated: -155.2124
- Logps/real: -181.5346
- Logits/generated: -2.1203
- Logits/real: -2.3164
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/real | Rewards/generated | Rewards/accuracies | Rewards/margins | Logps/generated | Logps/real | Logits/generated | Logits/real |
---|---|---|---|---|---|---|---|---|---|---|---|
0.7877 | 0.0992 | 31 | 0.7717 | 0.6905 | 0.4761 | 0.7308 | 0.2144 | -103.7523 | -129.5006 | -2.5223 | -2.6981 |
0.6309 | 0.1984 | 62 | 0.7320 | 2.1415 | 1.7428 | 0.7115 | 0.3987 | -91.0858 | -114.9906 | -2.5564 | -2.7396 |
0.5309 | 0.2976 | 93 | 0.7175 | 1.5709 | 0.9016 | 0.6538 | 0.6692 | -99.4969 | -120.6967 | -2.4703 | -2.6171 |
0.4323 | 0.3968 | 124 | 0.7714 | 1.9586 | 1.4739 | 0.6923 | 0.4847 | -93.7744 | -116.8195 | -2.6808 | -2.8349 |
0.297 | 0.496 | 155 | 0.7161 | 2.2903 | 1.7549 | 0.8077 | 0.5355 | -90.9648 | -113.5018 | -2.6256 | -2.7696 |
0.2144 | 0.5952 | 186 | 0.8257 | 1.6038 | 1.0213 | 0.7115 | 0.5825 | -98.3000 | -120.3671 | -2.8900 | -3.0602 |
0.2497 | 0.6944 | 217 | 0.6849 | 2.4543 | 1.7831 | 0.8077 | 0.6712 | -90.6823 | -111.8619 | -2.6469 | -2.8201 |
0.1112 | 0.7936 | 248 | 0.6993 | 2.2831 | 1.4322 | 0.7885 | 0.8508 | -94.1910 | -113.5747 | -2.7020 | -2.8645 |
0.176 | 0.8928 | 279 | 0.6700 | 2.7841 | 2.2447 | 0.7692 | 0.5394 | -86.0663 | -108.5641 | -2.8051 | -2.9280 |
0.1135 | 0.992 | 310 | 0.6956 | 2.3849 | 1.8198 | 0.7885 | 0.5651 | -90.3149 | -112.5561 | -2.8024 | -2.9203 |
0.1221 | 1.0912 | 341 | 0.7314 | 2.2046 | 1.6143 | 0.7308 | 0.5903 | -92.3708 | -114.3593 | -2.5886 | -2.7365 |
0.0864 | 1.1904 | 372 | 0.7718 | 2.3206 | 1.9459 | 0.6346 | 0.3747 | -89.0543 | -113.1994 | -2.5355 | -2.7014 |
0.0871 | 1.2896 | 403 | 0.8231 | 1.9873 | 1.7063 | 0.5962 | 0.2810 | -91.4506 | -116.5322 | -2.5240 | -2.6833 |
0.1454 | 1.3888 | 434 | 0.7980 | 1.7358 | 1.2782 | 0.6731 | 0.4576 | -95.7309 | -119.0471 | -2.4325 | -2.6120 |
0.0747 | 1.488 | 465 | 0.8086 | 1.9033 | 1.4938 | 0.6538 | 0.4094 | -93.5750 | -117.3725 | -2.3557 | -2.5683 |
0.0882 | 1.5872 | 496 | 0.9281 | 0.8252 | 0.4834 | 0.5192 | 0.3418 | -103.6798 | -128.1537 | -2.2722 | -2.4783 |
0.0693 | 1.6864 | 527 | 0.8954 | 0.5032 | -0.0439 | 0.6154 | 0.5471 | -108.9523 | -131.3737 | -2.1399 | -2.3681 |
0.0982 | 1.7856 | 558 | 0.8777 | 1.0122 | 0.5411 | 0.6538 | 0.4711 | -103.1028 | -126.2834 | -2.3326 | -2.5183 |
0.0674 | 1.8848 | 589 | 0.9360 | -0.0587 | -0.5311 | 0.5962 | 0.4724 | -113.8238 | -136.9920 | -2.3026 | -2.4848 |
0.0424 | 1.984 | 620 | 0.9421 | -0.2586 | -0.6968 | 0.5769 | 0.4382 | -115.4816 | -138.9915 | -2.2955 | -2.4846 |
0.0235 | 2.0832 | 651 | 1.0939 | -1.6766 | -2.0193 | 0.5 | 0.3428 | -128.7065 | -153.1709 | -2.2115 | -2.3974 |
0.024 | 2.1824 | 682 | 1.1491 | -2.1565 | -2.5396 | 0.5 | 0.3831 | -133.9093 | -157.9701 | -2.2049 | -2.3936 |
0.0469 | 2.2816 | 713 | 1.1324 | -2.0618 | -2.4801 | 0.5 | 0.4183 | -133.3140 | -157.0232 | -2.2161 | -2.4094 |
0.0328 | 2.3808 | 744 | 1.1837 | -2.4534 | -2.7702 | 0.4808 | 0.3168 | -136.2151 | -160.9390 | -2.2080 | -2.3952 |
0.0367 | 2.48 | 775 | 1.1779 | -2.6139 | -2.9724 | 0.4808 | 0.3585 | -138.2376 | -162.5442 | -2.1815 | -2.3777 |
0.0596 | 2.5792 | 806 | 1.2847 | -3.3490 | -3.6206 | 0.4231 | 0.2716 | -144.7193 | -169.8953 | -2.1523 | -2.3458 |
0.0395 | 2.6784 | 837 | 1.3358 | -3.6588 | -3.9010 | 0.4231 | 0.2422 | -147.5237 | -172.9937 | -2.1399 | -2.3346 |
0.0302 | 2.7776 | 868 | 1.3725 | -3.7911 | -4.0386 | 0.4231 | 0.2474 | -148.8990 | -174.3167 | -2.1529 | -2.3475 |
0.0132 | 2.8768 | 899 | 1.4969 | -4.4629 | -4.6237 | 0.4423 | 0.1607 | -154.7499 | -181.0344 | -2.1227 | -2.3178 |
0.034 | 2.976 | 930 | 1.4922 | -4.5129 | -4.6699 | 0.4423 | 0.1570 | -155.2124 | -181.5346 | -2.1203 | -2.3164 |
Framework versions
- Transformers 4.43.3
- Pytorch 2.2.2+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 19
Model tree for AmberYifan/Mistral-7B-v0.3-dpo-10k
Base model
mistralai/Mistral-7B-v0.3