--- library_name: peft tags: - alignment-handbook - trl - dpo - generated_from_trainer base_model: NbAiLab/nb-gpt-j-6B-v2 datasets: - hugodk-sch/aftonposten_title_prefs model-index: - name: aftonposten-6b-align-scan results: [] --- # aftonposten-6b-align-scan This model is a fine-tuned version of [data/ap-gpt-j-6b-sft-qlora-04-08](https://huggingface.co./data/ap-gpt-j-6b-sft-qlora-04-08) on the hugodk-sch/aftonposten_title_prefs dataset. It achieves the following results on the evaluation set: - Loss: 0.7456 - Rewards/chosen: 0.0514 - Rewards/rejected: 0.0332 - Rewards/accuracies: 0.5365 - Rewards/margins: 0.0182 - Logps/rejected: -37.4614 - Logps/chosen: -33.9489 - Logits/rejected: -2.2407 - Logits/chosen: -2.2456 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-06 - train_batch_size: 4 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - gradient_accumulation_steps: 2 - total_train_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 4 ### Training results | Training Loss | Epoch | Step | Logits/chosen | Logits/rejected | Logps/chosen | Logps/rejected | Validation Loss | Rewards/accuracies | Rewards/chosen | Rewards/margins | Rewards/rejected | |:-------------:|:-----:|:----:|:-------------:|:---------------:|:------------:|:--------------:|:---------------:|:------------------:|:--------------:|:---------------:|:----------------:| | 0.5903 | 0.26 | 100 | -2.2366 | -2.2318 | -34.0141 | -37.5087 | 0.7450 | 0.5241 | 0.0123 | 0.0075 | 0.0047 | | 0.59 | 0.52 | 200 | -2.2332 | -2.2284 | -34.0691 | -37.5752 | 0.7500 | 0.5365 | -0.0208 | 0.0144 | -0.0352 | | 0.3871 | 0.78 | 300 | -2.2333 | -2.2285 | -34.0622 | -37.5657 | 0.7655 | 0.5071 | -0.0166 | 0.0129 | -0.0295 | | 0.3315 | 1.04 | 400 | 0.7479 | -0.0009 | -0.0296 | 0.5515 | 0.0287 | -37.5660 | -34.0361 | -2.2265 | -2.2314 | | 0.3857 | 1.3 | 500 | 0.7601 | -0.0010 | -0.0248 | 0.5577 | 0.0238 | -37.5580 | -34.0362 | -2.2294 | -2.2343 | | 0.5531 | 1.56 | 600 | 0.7435 | 0.0040 | -0.0183 | 0.5577 | 0.0223 | -37.5471 | -34.0279 | -2.2416 | -2.2465 | | 0.4315 | 1.82 | 700 | 0.8141 | -0.0077 | -0.0028 | 0.4996 | -0.0049 | -37.5213 | -34.0474 | -2.2449 | -2.2498 | | 0.3445 | 2.08 | 800 | 0.8104 | 0.0382 | 0.0369 | 0.5282 | 0.0012 | -37.4551 | -33.9710 | -2.2437 | -2.2485 | | 0.1902 | 2.34 | 900 | 0.8123 | 0.0329 | 0.0364 | 0.5199 | -0.0035 | -37.4560 | -33.9797 | -2.2426 | -2.2475 | | 0.1724 | 2.6 | 1000 | 0.7850 | 0.0479 | 0.0385 | 0.5365 | 0.0094 | -37.4525 | -33.9548 | -2.2433 | -2.2481 | | 0.1864 | 2.86 | 1100 | 0.7625 | 0.0435 | 0.0297 | 0.5133 | 0.0137 | -37.4671 | -33.9621 | -2.2392 | -2.2441 | | 0.1629 | 3.12 | 1200 | 0.7624 | 0.0479 | 0.0337 | 0.5341 | 0.0142 | -37.4604 | -33.9547 | -2.2387 | -2.2436 | | 0.1881 | 3.38 | 1300 | 0.7945 | 0.0454 | 0.0440 | 0.4967 | 0.0014 | -37.4432 | -33.9588 | -2.2391 | -2.2440 | | 0.1395 | 3.64 | 1400 | 0.7520 | 0.0496 | 0.0341 | 0.5453 | 0.0156 | -37.4598 | -33.9518 | -2.2408 | -2.2457 | | 0.1436 | 3.9 | 1500 | 0.7352 | 0.0537 | 0.0336 | 0.5399 | 0.0201 | -37.4606 | -33.9451 | -2.2407 | -2.2456 | ### Framework versions - PEFT 0.8.2 - Transformers 4.37.2 - Pytorch 2.1.2+cu121 - Datasets 2.17.0 - Tokenizers 0.15.1