hugodk-sch's picture
Model save
c02310c verified
|
raw
history blame
4.81 kB
metadata
library_name: peft
tags:
  - trl
  - dpo
  - alignment-handbook
  - generated_from_trainer
base_model: NbAiLab/nb-gpt-j-6B-v2
model-index:
  - name: aftonposten-6b-align-scan
    results: []

aftonposten-6b-align-scan

This model is a fine-tuned version of NbAiLab/nb-gpt-j-6B-v2 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9176
  • Rewards/chosen: -0.0906
  • Rewards/rejected: -0.1873
  • Rewards/accuracies: 0.5627
  • Rewards/margins: 0.0967
  • Logps/rejected: -37.7248
  • Logps/chosen: -34.1353
  • Logits/rejected: -2.2000
  • Logits/chosen: -2.2049

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 4

Training results

Training Loss Epoch Step Logits/chosen Logits/rejected Logps/chosen Logps/rejected Validation Loss Rewards/accuracies Rewards/chosen Rewards/margins Rewards/rejected
0.8913 0.26 100 -2.2322 -2.2273 -34.0407 -37.5405 0.9845 0.5195 -0.0055 0.0159 -0.0215
0.7293 0.52 200 -2.2286 -2.2238 -34.0537 -37.5811 0.9602 0.5714 -0.0172 0.0408 -0.0580
0.6144 0.78 300 -2.2249 -2.2201 -34.0866 -37.6032 0.9713 0.5282 -0.0468 0.0310 -0.0779
0.3632 1.04 400 0.9495 -0.0909 -0.1434 0.5602 0.0525 -37.6760 -34.1355 -2.2076 -2.2125
0.2994 1.3 500 0.9461 -0.1647 -0.2318 0.5540 0.0671 -37.7742 -34.2176 -2.2162 -2.2210
0.3408 1.56 600 0.9077 -0.0675 -0.1694 0.5868 0.1019 -37.7048 -34.1096 -2.2017 -2.2066
0.2796 1.82 700 0.9425 -0.0929 -0.1626 0.5569 0.0697 -37.6973 -34.1378 -2.2012 -2.2061
0.1052 2.08 800 0.9125 -0.0848 -0.1863 0.5926 0.1015 -37.7236 -34.1288 -2.2003 -2.2051
0.095 2.34 900 0.9005 -0.0802 -0.1942 0.5540 0.1140 -37.7324 -34.1237 -2.2019 -2.2067
0.123 2.6 1000 0.9194 -0.0907 -0.1876 0.5511 0.0969 -37.7251 -34.1353 -2.1994 -2.2043
0.0894 2.86 1100 0.9182 -0.0915 -0.1890 0.5336 0.0976 -37.7267 -34.1362 -2.2001 -2.2050
0.1086 3.12 1200 0.9023 -0.0864 -0.1976 0.5627 0.1112 -37.7362 -34.1306 -2.2006 -2.2054
0.0577 3.38 1300 0.9154 -0.0922 -0.1935 0.5598 0.1013 -37.7317 -34.1370 -2.2002 -2.2050
0.0375 3.64 1400 0.9233 -0.0896 -0.1810 0.5569 0.0914 -37.7178 -34.1342 -2.2002 -2.2050
0.0724 3.9 1500 0.9176 -0.0906 -0.1873 0.5627 0.0967 -37.7248 -34.1353 -2.2000 -2.2049

Framework versions

  • PEFT 0.10.0
  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.1