hugodk-sch's picture
End of training
e76985a verified
|
raw
history blame
4.91 kB
metadata
library_name: peft
tags:
  - alignment-handbook
  - trl
  - dpo
  - generated_from_trainer
base_model: NbAiLab/nb-gpt-j-6B-v2
datasets:
  - hugodk-sch/aftonposten_title_prefs
model-index:
  - name: aftonposten-6b-align-scan
    results: []

aftonposten-6b-align-scan

This model is a fine-tuned version of data/ap-gpt-j-6b-sft-qlora-04-08 on the hugodk-sch/aftonposten_title_prefs dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6747
  • Rewards/chosen: -0.2693
  • Rewards/rejected: -0.4359
  • Rewards/accuracies: 0.6009
  • Rewards/margins: 0.1666
  • Logps/rejected: -38.1393
  • Logps/chosen: -34.4192
  • Logits/rejected: -2.1170
  • Logits/chosen: -2.1217

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 4

Training results

Training Loss Epoch Step Logits/chosen Logits/rejected Logps/chosen Logps/rejected Validation Loss Rewards/accuracies Rewards/chosen Rewards/margins Rewards/rejected
0.6577 0.26 100 -2.2338 -2.2289 -34.0400 -37.5357 0.6949 0.5316 -0.0038 0.0095 -0.0133
0.6156 0.52 200 -2.2317 -2.2268 -34.0535 -37.5578 0.6943 0.5191 -0.0132 0.0156 -0.0288
0.5468 0.78 300 -2.2277 -2.2229 -34.0684 -37.5860 0.6903 0.5191 -0.0237 0.0249 -0.0486
0.4243 1.04 400 0.6886 -0.0963 -0.1489 0.5540 0.0526 -37.7293 -34.1721 -2.1933 -2.1981
0.353 1.3 500 0.6901 -0.1994 -0.2743 0.5569 0.0749 -37.9085 -34.3194 -2.1884 -2.1932
0.3554 1.56 600 0.6763 -0.1468 -0.2572 0.5806 0.1103 -37.8840 -34.2443 -2.1701 -2.1749
0.3333 1.82 700 0.6813 -0.1817 -0.2946 0.5743 0.1129 -37.9375 -34.2941 -2.1549 -2.1596
0.2025 2.08 800 0.6800 -0.2316 -0.3667 0.5660 0.1351 -38.0405 -34.3655 -2.1413 -2.1461
0.2153 2.34 900 0.6866 -0.2482 -0.3826 0.5835 0.1344 -38.0632 -34.3891 -2.1292 -2.1340
0.2381 2.6 1000 0.6821 -0.2624 -0.4162 0.5864 0.1538 -38.1112 -34.4094 -2.1207 -2.1255
0.1898 2.86 1100 0.6858 -0.2673 -0.4161 0.5831 0.1487 -38.1110 -34.4164 -2.1188 -2.1235
0.2231 3.12 1200 0.6780 -0.2626 -0.4264 0.5889 0.1637 -38.1257 -34.4097 -2.1175 -2.1223
0.164 3.38 1300 0.6834 -0.2678 -0.4194 0.5947 0.1516 -38.1158 -34.4172 -2.1174 -2.1221
0.1562 3.64 1400 0.6753 -0.2666 -0.4361 0.5922 0.1696 -38.1396 -34.4154 -2.1172 -2.1219
0.2163 3.9 1500 0.6831 -0.2684 -0.4218 0.5801 0.1534 -38.1192 -34.4180 -2.1173 -2.1220

Framework versions

  • PEFT 0.10.0
  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.1