NanQiangHF's picture
NanQiangHF/llama3.1_8b_dpo_bwgenerator_test2
6e202d4 verified
metadata
license: llama3.1
library_name: peft
tags:
  - trl
  - dpo
  - generated_from_trainer
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
model-index:
  - name: llama3.1_8b_dpo_bwgenerator_test2
    results: []

llama3.1_8b_dpo_bwgenerator_test2

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5181
  • Rewards/chosen: -0.4278
  • Rewards/rejected: -0.8508
  • Rewards/accuracies: 0.9255
  • Rewards/margins: 0.4230
  • Logps/rejected: -118.6553
  • Logps/chosen: -88.8263
  • Logits/rejected: -0.9049
  • Logits/chosen: -1.6027

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6002 0.0719 1000 0.5390 -0.3845 -0.7489 0.9194 0.3644 -117.6367 -88.3940 -0.9017 -1.6013
0.5325 0.1438 2000 0.5237 -0.4184 -0.8254 0.9200 0.4070 -118.4018 -88.7330 -0.9052 -1.6035
0.5221 0.2157 3000 0.5199 -0.4201 -0.8376 0.9210 0.4175 -118.5239 -88.7496 -0.9038 -1.6021
0.518 0.2876 4000 0.5178 -0.4376 -0.8621 0.9220 0.4246 -118.7688 -88.9242 -0.9056 -1.6036
0.5177 0.3595 5000 0.5176 -0.4317 -0.8563 0.9213 0.4246 -118.7104 -88.8652 -0.9063 -1.6039
0.5186 0.4313 6000 0.5180 -0.4361 -0.8604 0.9200 0.4243 -118.7512 -88.9096 -0.9063 -1.6040
0.522 0.5032 7000 0.5175 -0.4358 -0.8614 0.9210 0.4255 -118.7612 -88.9070 -0.9057 -1.6035
0.5194 0.5751 8000 0.5182 -0.4280 -0.8506 0.9249 0.4226 -118.6538 -88.8285 -0.9039 -1.6020
0.5149 0.6470 9000 0.5179 -0.4413 -0.8651 0.9229 0.4238 -118.7981 -88.9612 -0.9060 -1.6038
0.5209 0.7189 10000 0.5178 -0.4355 -0.8600 0.9216 0.4244 -118.7471 -88.9040 -0.9049 -1.6027
0.517 0.7908 11000 0.5187 -0.4343 -0.8561 0.9194 0.4217 -118.7081 -88.8918 -0.9046 -1.6027
0.5202 0.8627 12000 0.5186 -0.4321 -0.8540 0.9197 0.4220 -118.6880 -88.8693 -0.9047 -1.6026
0.5212 0.9346 13000 0.5181 -0.4278 -0.8508 0.9255 0.4230 -118.6553 -88.8263 -0.9049 -1.6027

Framework versions

  • PEFT 0.10.0
  • Transformers 4.44.0
  • Pytorch 2.3.0+cu121
  • Datasets 2.14.7
  • Tokenizers 0.19.1