hugodk-sch's picture
Model save
aff76c5 verified
|
raw
history blame
4.81 kB
metadata
library_name: peft
tags:
  - trl
  - dpo
  - alignment-handbook
  - generated_from_trainer
base_model: NbAiLab/nb-gpt-j-6B-v2
model-index:
  - name: aftonposten-6b-align-scan
    results: []

aftonposten-6b-align-scan

This model is a fine-tuned version of NbAiLab/nb-gpt-j-6B-v2 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.6201
  • Rewards/chosen: 0.0327
  • Rewards/rejected: 0.0149
  • Rewards/accuracies: 0.5249
  • Rewards/margins: 0.0178
  • Logps/rejected: -37.4793
  • Logps/chosen: -33.9527
  • Logits/rejected: -2.2332
  • Logits/chosen: -2.2381

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 4

Training results

Training Loss Epoch Step Logits/chosen Logits/rejected Logps/chosen Logps/rejected Validation Loss Rewards/accuracies Rewards/chosen Rewards/margins Rewards/rejected
1.4583 0.26 100 -2.2357 -2.2308 -34.0303 -37.5236 1.6162 0.5245 0.0017 0.0045 -0.0028
1.279 0.52 200 -2.2359 -2.2311 -34.0825 -37.5923 1.6100 0.5257 -0.0192 0.0111 -0.0303
1.0043 0.78 300 -2.2312 -2.2263 -34.0845 -37.6004 1.5962 0.5166 -0.0200 0.0135 -0.0335
0.7239 1.04 400 1.6461 -0.0219 -0.0311 0.5341 0.0092 -37.5945 -34.0893 -2.2276 -2.2324
0.6061 1.3 500 1.6487 -0.0274 -0.0429 0.5395 0.0155 -37.6239 -34.1030 -2.2282 -2.2330
0.9255 1.56 600 1.5912 0.0108 -0.0119 0.5544 0.0228 -37.5464 -34.0074 -2.2273 -2.2321
0.8252 1.82 700 1.6334 0.0226 0.0045 0.5216 0.0180 -37.5053 -33.9781 -2.2298 -2.2346
0.2848 2.08 800 1.6033 0.0153 -0.0031 0.5249 0.0184 -37.5244 -33.9964 -2.2313 -2.2361
0.3671 2.34 900 1.6569 0.0283 0.0177 0.5162 0.0106 -37.4723 -33.9637 -2.2309 -2.2358
0.3936 2.6 1000 1.6203 0.0348 0.0187 0.5428 0.0161 -37.4698 -33.9475 -2.2325 -2.2374
0.3156 2.86 1100 1.6012 0.0302 0.0108 0.5606 0.0194 -37.4896 -33.9592 -2.2326 -2.2375
0.2893 3.12 1200 1.5705 0.0346 0.0103 0.5365 0.0243 -37.4909 -33.9480 -2.2335 -2.2383
0.277 3.38 1300 1.6102 0.0314 0.0121 0.5403 0.0194 -37.4865 -33.9559 -2.2333 -2.2382
0.139 3.64 1400 1.6181 0.0273 0.0092 0.5307 0.0181 -37.4937 -33.9663 -2.2333 -2.2381
0.24 3.9 1500 1.6201 0.0327 0.0149 0.5249 0.0178 -37.4793 -33.9527 -2.2332 -2.2381

Framework versions

  • PEFT 0.10.0
  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.1