Edit model card

Mistral-7B-v0.3-gen-dpo-10k

This model is a fine-tuned version of mistralai/Mistral-7B-v0.3 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4487
  • Rewards/real: 7.5104
  • Rewards/generated: 1.3739
  • Rewards/accuracies: 0.8462
  • Rewards/margins: 6.1365
  • Logps/generated: -218.1546
  • Logps/real: -153.6829
  • Logits/generated: -1.8523
  • Logits/real: -2.6888

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/real Rewards/generated Rewards/accuracies Rewards/margins Logps/generated Logps/real Logits/generated Logits/real
0.877 0.0992 31 0.8103 0.1264 -0.1498 0.7308 0.2762 -233.3916 -227.5227 -3.0962 -3.1732
0.6795 0.1984 62 0.6395 0.2309 -1.2902 0.8462 1.5211 -244.7957 -226.4777 -2.7947 -2.9556
0.5656 0.2976 93 0.5345 -0.4078 -2.8686 0.8846 2.4608 -260.5800 -232.8651 -2.6125 -2.8094
0.5392 0.3968 124 0.4632 0.7347 -2.5555 0.9038 3.2903 -257.4490 -221.4394 -2.6252 -2.8182
0.519 0.496 155 0.4194 0.8203 -2.7159 0.9038 3.5362 -259.0524 -220.5834 -2.5450 -2.7892
0.4522 0.5952 186 0.4109 1.1478 -3.3424 0.9038 4.4902 -265.3172 -217.3086 -2.1889 -2.5745
0.4175 0.6944 217 0.4189 1.7292 -3.5464 0.8846 5.2756 -267.3578 -211.4943 -2.0150 -2.4367
0.5123 0.7936 248 0.4031 1.5992 -2.5850 0.9038 4.1842 -257.7434 -212.7944 -2.3289 -2.6748
0.4467 0.8928 279 0.4215 2.1259 -3.1648 0.8654 5.2908 -263.5421 -207.5273 -1.9457 -2.5122
0.432 0.992 310 0.3889 2.4989 -2.2218 0.9038 4.7207 -254.1118 -203.7978 -2.1945 -2.6616
0.206 1.0912 341 0.3944 3.9149 -1.2192 0.8654 5.1341 -244.0859 -189.6380 -2.1800 -2.7888
0.1884 1.1904 372 0.3790 4.2792 -1.2916 0.8846 5.5708 -244.8093 -185.9946 -2.2022 -2.8626
0.1866 1.2896 403 0.3799 4.7981 -1.0761 0.8654 5.8742 -242.6544 -180.8058 -2.1602 -2.8470
0.195 1.3888 434 0.3898 5.3519 -0.2792 0.8462 5.6311 -234.6861 -175.2681 -2.2097 -2.9184
0.1787 1.488 465 0.4027 5.4325 -0.2501 0.8462 5.6826 -234.3951 -174.4621 -2.3064 -2.9594
0.1808 1.5872 496 0.3806 5.3354 -0.7266 0.8654 6.0620 -239.1595 -175.4325 -2.1412 -2.9272
0.1629 1.6864 527 0.3708 5.4311 -0.4097 0.8846 5.8408 -235.9910 -174.4760 -2.3029 -2.9067
0.1993 1.7856 558 0.3883 6.1042 0.4673 0.8654 5.6370 -227.2212 -167.7442 -2.2351 -2.9173
0.1687 1.8848 589 0.3744 5.8543 -0.2070 0.8846 6.0613 -233.9639 -170.2437 -2.1107 -2.7838
0.1721 1.984 620 0.3694 5.9535 -0.1297 0.8846 6.0832 -233.1905 -169.2515 -2.1660 -2.8452
0.1383 2.0832 651 0.3671 6.4065 0.2173 0.8846 6.1892 -229.7208 -164.7217 -1.9638 -2.7078
0.1365 2.1824 682 0.3923 7.0262 0.7925 0.8654 6.2337 -223.9683 -158.5246 -1.8941 -2.6932
0.1396 2.2816 713 0.4887 7.4973 2.3669 0.8077 5.1304 -208.2246 -153.8134 -2.1548 -2.9114
0.1397 2.3808 744 0.4148 7.3082 1.2357 0.8462 6.0725 -219.5363 -155.7047 -1.9055 -2.6924
0.1351 2.48 775 0.4137 7.3508 1.1950 0.8654 6.1558 -219.9435 -155.2784 -1.8891 -2.6999
0.14 2.5792 806 0.4429 7.5628 1.7969 0.8654 5.7659 -213.9247 -153.1584 -1.9732 -2.7988
0.1303 2.6784 837 0.4819 7.7271 2.2012 0.8654 5.5260 -209.8819 -151.5153 -1.9800 -2.8090
0.1329 2.7776 868 0.4405 7.4045 1.1559 0.8462 6.2486 -220.3349 -154.7421 -1.8232 -2.6413
0.1353 2.8768 899 0.4549 7.5125 1.3390 0.8462 6.1735 -218.5042 -153.6622 -1.8288 -2.6739
0.1319 2.976 930 0.4487 7.5104 1.3739 0.8462 6.1365 -218.1546 -153.6829 -1.8523 -2.6888

Framework versions

  • Transformers 4.43.3
  • Pytorch 2.2.2+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
12
Safetensors
Model size
7.25B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for AmberYifan/Mistral-7B-v0.3-gen-dpo-10k

Finetuned
(66)
this model