tsavage68's picture
End of training
5e09016 verified
metadata
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.2
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: mistralit2_1000_STEPS_1e6_rate_05_beta_DPO
    results: []

mistralit2_1000_STEPS_1e6_rate_05_beta_DPO

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6223
  • Rewards/chosen: -1.9087
  • Rewards/rejected: -2.8966
  • Rewards/accuracies: 0.6593
  • Rewards/margins: 0.9879
  • Logps/rejected: -34.3656
  • Logps/chosen: -27.2032
  • Logits/rejected: -2.8455
  • Logits/chosen: -2.8459

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-07
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6684 0.1 50 0.6660 -0.2264 -0.2957 0.5934 0.0693 -29.1637 -23.8386 -2.8636 -2.8639
0.5945 0.2 100 0.6396 -1.5064 -1.9635 0.6044 0.4572 -32.4994 -26.3985 -2.8444 -2.8447
0.4899 0.29 150 0.6602 -2.2474 -2.9308 0.6022 0.6835 -34.4341 -27.8806 -2.8445 -2.8448
0.5517 0.39 200 0.6024 -0.7758 -1.2571 0.6418 0.4813 -31.0867 -24.9374 -2.8613 -2.8616
0.6385 0.49 250 0.5703 -0.5516 -1.1264 0.6703 0.5749 -30.8253 -24.4890 -2.8571 -2.8574
0.5653 0.59 300 0.5989 -1.4256 -2.1727 0.6440 0.7471 -32.9178 -26.2370 -2.8464 -2.8467
0.5255 0.68 350 0.6054 -1.6264 -2.4443 0.6484 0.8179 -33.4610 -26.6386 -2.8533 -2.8536
0.6612 0.78 400 0.6157 -1.7163 -2.5329 0.6418 0.8166 -33.6383 -26.8185 -2.8530 -2.8533
0.646 0.88 450 0.6016 -1.1753 -1.8651 0.6440 0.6898 -32.3026 -25.7364 -2.8525 -2.8529
0.5146 0.98 500 0.5957 -1.1531 -1.8752 0.6484 0.7221 -32.3227 -25.6920 -2.8553 -2.8556
0.297 1.07 550 0.5863 -1.2310 -2.0319 0.6571 0.8009 -32.6362 -25.8478 -2.8539 -2.8542
0.2709 1.17 600 0.6234 -1.7413 -2.6395 0.6527 0.8982 -33.8514 -26.8684 -2.8489 -2.8493
0.4008 1.27 650 0.6173 -1.8482 -2.8001 0.6549 0.9519 -34.1726 -27.0823 -2.8472 -2.8476
0.2846 1.37 700 0.6222 -1.8576 -2.8175 0.6505 0.9599 -34.2075 -27.1011 -2.8466 -2.8470
0.2129 1.46 750 0.6233 -1.8931 -2.8716 0.6571 0.9785 -34.3156 -27.1720 -2.8458 -2.8462
0.3026 1.56 800 0.6224 -1.9044 -2.8881 0.6593 0.9837 -34.3486 -27.1947 -2.8455 -2.8458
0.3361 1.66 850 0.6242 -1.9113 -2.9007 0.6659 0.9894 -34.3738 -27.2085 -2.8456 -2.8460
0.2965 1.76 900 0.6223 -1.9123 -2.8982 0.6615 0.9859 -34.3687 -27.2103 -2.8456 -2.8460
0.2779 1.86 950 0.6213 -1.9078 -2.8977 0.6593 0.9900 -34.3678 -27.2013 -2.8455 -2.8459
0.2334 1.95 1000 0.6223 -1.9087 -2.8966 0.6593 0.9879 -34.3656 -27.2032 -2.8455 -2.8459

Framework versions

  • Transformers 4.38.2
  • Pytorch 2.0.0+cu117
  • Datasets 2.18.0
  • Tokenizers 0.15.2