hugodk-sch's picture
End of training
64cc945 verified
|
raw
history blame
4.9 kB
metadata
library_name: peft
tags:
  - alignment-handbook
  - trl
  - dpo
  - generated_from_trainer
base_model: NbAiLab/nb-gpt-j-6B-v2
datasets:
  - hugodk-sch/aftonposten_title_prefs
model-index:
  - name: aftonposten-6b-align-scan
    results: []

aftonposten-6b-align-scan

This model is a fine-tuned version of data/ap-gpt-j-6b-sft-qlora-04-08 on the hugodk-sch/aftonposten_title_prefs dataset. It achieves the following results on the evaluation set:

  • Loss: 6.0779
  • Rewards/chosen: -0.0133
  • Rewards/rejected: -0.0383
  • Rewards/accuracies: 0.5714
  • Rewards/margins: 0.0250
  • Logps/rejected: -37.7083
  • Logps/chosen: -34.1011
  • Logits/rejected: -2.2004
  • Logits/chosen: -2.2052

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 4

Training results

Training Loss Epoch Step Logits/chosen Logits/rejected Logps/chosen Logps/rejected Validation Loss Rewards/accuracies Rewards/chosen Rewards/margins Rewards/rejected
5.6745 0.26 100 -2.2338 -2.2290 -34.0153 -37.5181 6.2508 0.5461 0.0038 0.0041 -0.0003
5.2135 0.52 200 -2.2315 -2.2267 -34.0014 -37.5042 6.2881 0.5403 0.0066 0.0041 0.0025
4.3883 0.78 300 -2.2291 -2.2243 -34.0193 -37.5325 6.2382 0.5166 0.0031 0.0062 -0.0032
2.9753 1.04 400 6.0369 0.0069 -0.0106 0.6034 0.0176 -37.5698 -33.9999 -2.2093 -2.2141
2.4163 1.3 500 6.0677 -0.0149 -0.0375 0.5801 0.0225 -37.7039 -34.1092 -2.1858 -2.1907
2.52 1.56 600 5.9990 -0.0097 -0.0348 0.5748 0.0251 -37.6905 -34.0832 -2.1951 -2.1999
2.9186 1.82 700 6.1696 -0.0176 -0.0364 0.5598 0.0188 -37.6988 -34.1227 -2.2048 -2.2097
1.2867 2.08 800 6.0594 -0.0122 -0.0361 0.5777 0.0239 -37.6970 -34.0957 -2.2060 -2.2109
0.8862 2.34 900 6.0621 -0.0165 -0.0403 0.5918 0.0237 -37.7179 -34.1172 -2.2027 -2.2076
1.2395 2.6 1000 6.0000 -0.0163 -0.0418 0.5864 0.0255 -37.7257 -34.1161 -2.2002 -2.2050
1.4312 2.86 1100 5.9905 -0.0144 -0.0409 0.5860 0.0264 -37.7210 -34.1067 -2.1989 -2.2038
1.0133 3.12 1200 6.1103 -0.0167 -0.0396 0.5889 0.0229 -37.7146 -34.1182 -2.2000 -2.2048
0.5152 3.38 1300 6.0578 -0.0132 -0.0383 0.5544 0.0251 -37.7080 -34.1004 -2.2003 -2.2051
0.8378 3.64 1400 6.0572 -0.0138 -0.0389 0.5748 0.0251 -37.7113 -34.1035 -2.2004 -2.2052
0.9599 3.9 1500 6.0348 -0.0125 -0.0385 0.5835 0.0260 -37.7091 -34.0972 -2.2004 -2.2052

Framework versions

  • PEFT 0.8.2
  • Transformers 4.37.2
  • Pytorch 2.1.2+cu121
  • Datasets 2.17.0
  • Tokenizers 0.15.1