hugodk-sch's picture
End of training
4a53a2d verified
|
raw
history blame
4.9 kB
metadata
library_name: peft
tags:
  - alignment-handbook
  - trl
  - dpo
  - generated_from_trainer
base_model: NbAiLab/nb-gpt-j-6B-v2
datasets:
  - hugodk-sch/aftonposten_title_prefs
model-index:
  - name: aftonposten-6b-align-scan
    results: []

aftonposten-6b-align-scan

This model is a fine-tuned version of data/ap-gpt-j-6b-sft-qlora-04-08 on the hugodk-sch/aftonposten_title_prefs dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4730
  • Rewards/chosen: 0.2464
  • Rewards/rejected: 0.1175
  • Rewards/accuracies: 0.5947
  • Rewards/margins: 0.1288
  • Logps/rejected: -37.3207
  • Logps/chosen: -33.6239
  • Logits/rejected: -2.1470
  • Logits/chosen: -2.1517

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 4

Training results

Training Loss Epoch Step Logits/chosen Logits/rejected Logps/chosen Logps/rejected Validation Loss Rewards/accuracies Rewards/chosen Rewards/margins Rewards/rejected
0.4799 0.26 100 -2.2349 -2.2301 -33.9029 -37.3943 0.4987 0.5104 0.0790 0.0056 0.0734
0.4548 0.52 200 -2.2335 -2.2287 -33.7696 -37.2846 0.4956 0.5341 0.1590 0.0198 0.1392
0.41 0.78 300 -2.2309 -2.2261 -33.7614 -37.2848 0.4937 0.5361 0.1639 0.0248 0.1391
0.3497 1.04 400 0.4927 0.2171 0.1863 0.5652 0.0309 -37.2062 -33.6727 -2.2113 -2.2162
0.2906 1.3 500 0.4870 0.2484 0.1921 0.5922 0.0563 -37.1964 -33.6205 -2.1834 -2.1881
0.3014 1.56 600 0.4796 0.2630 0.1719 0.5797 0.0911 -37.2301 -33.5962 -2.1694 -2.1741
0.2776 1.82 700 0.4825 0.2341 0.1554 0.5768 0.0787 -37.2576 -33.6444 -2.1625 -2.1672
0.201 2.08 800 0.4766 0.2639 0.1595 0.5914 0.1043 -37.2507 -33.5948 -2.1641 -2.1689
0.1721 2.34 900 0.4749 0.2446 0.1298 0.5860 0.1148 -37.3003 -33.6269 -2.1516 -2.1563
0.2259 2.6 1000 0.4736 0.2483 0.1257 0.5860 0.1226 -37.3072 -33.6207 -2.1481 -2.1528
0.2405 2.86 1100 0.4740 0.2438 0.1229 0.5860 0.1209 -37.3118 -33.6283 -2.1475 -2.1522
0.1793 3.12 1200 0.4746 0.2441 0.1249 0.5685 0.1192 -37.3085 -33.6277 -2.1469 -2.1516
0.1633 3.38 1300 0.4744 0.2433 0.1235 0.6009 0.1198 -37.3107 -33.6290 -2.1471 -2.1518
0.202 3.64 1400 0.4748 0.2450 0.1279 0.5831 0.1170 -37.3034 -33.6263 -2.1472 -2.1519
0.1889 3.9 1500 0.4727 0.2480 0.1188 0.6005 0.1292 -37.3186 -33.6212 -2.1470 -2.1517

Framework versions

  • PEFT 0.8.2
  • Transformers 4.37.2
  • Pytorch 2.1.2+cu121
  • Datasets 2.17.0
  • Tokenizers 0.15.1