hugodk-sch's picture
Model save
28afa8e verified
|
raw
history blame
4.81 kB
---
library_name: peft
tags:
- trl
- dpo
- alignment-handbook
- generated_from_trainer
base_model: NbAiLab/nb-gpt-j-6B-v2
model-index:
- name: aftonposten-6b-align-scan
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# aftonposten-6b-align-scan
This model is a fine-tuned version of [NbAiLab/nb-gpt-j-6B-v2](https://huggingface.co./NbAiLab/nb-gpt-j-6B-v2) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 2463.1057
- Rewards/chosen: -0.0198
- Rewards/rejected: -0.0244
- Rewards/accuracies: 0.5656
- Rewards/margins: 0.0046
- Logps/rejected: -39.9582
- Logps/chosen: -36.0192
- Logits/rejected: -1.8605
- Logits/chosen: -1.8648
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 2
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 4
### Training results
| Training Loss | Epoch | Step | Logits/chosen | Logits/rejected | Logps/chosen | Logps/rejected | Validation Loss | Rewards/accuracies | Rewards/chosen | Rewards/margins | Rewards/rejected |
|:-------------:|:-----:|:----:|:-------------:|:---------------:|:------------:|:--------------:|:---------------:|:------------------:|:--------------:|:---------------:|:----------------:|
| 2484.2775 | 0.26 | 100 | -2.2323 | -2.2275 | -34.0187 | -37.5204 | 2498.1597 | 0.5565 | 0.0002 | 0.0002 | -0.0000 |
| 2464.7859 | 0.52 | 200 | -2.2291 | -2.2242 | -34.0338 | -37.5574 | 2496.0791 | 0.5544 | 0.0000 | 0.0004 | -0.0004 |
| 2430.5529 | 0.78 | 300 | -2.2262 | -2.2213 | -34.0629 | -37.5906 | 2495.6680 | 0.5390 | -0.0003 | 0.0005 | -0.0007 |
| 2370.302 | 1.04 | 400 | 2492.4653 | -0.0006 | -0.0014 | 0.5623 | 0.0008 | -37.6613 | -34.0994 | -2.1946 | -2.1994 |
| 2298.9412 | 1.3 | 500 | 2488.4062 | -0.0025 | -0.0037 | 0.5772 | 0.0013 | -37.8883 | -34.2804 | -2.1295 | -2.1343 |
| 2298.7582 | 1.56 | 600 | 2477.6589 | -0.0053 | -0.0077 | 0.6121 | 0.0024 | -38.2854 | -34.5600 | -2.0589 | -2.0636 |
| 2254.2998 | 1.82 | 700 | 2477.6624 | -0.0096 | -0.0122 | 0.5540 | 0.0026 | -38.7330 | -34.9921 | -2.0040 | -2.0086 |
| 2173.4393 | 2.08 | 800 | 2470.5903 | -0.0118 | -0.0152 | 0.5569 | 0.0034 | -39.0386 | -35.2171 | -1.9665 | -1.9710 |
| 2065.36 | 2.34 | 900 | 2467.7329 | -0.0156 | -0.0195 | 0.5598 | 0.0039 | -39.4668 | -35.5978 | -1.9122 | -1.9166 |
| 2196.3246 | 2.6 | 1000 | 2464.3682 | -0.0186 | -0.0229 | 0.5511 | 0.0044 | -39.8091 | -35.8901 | -1.8782 | -1.8826 |
| 2237.6512 | 2.86 | 1100 | 2464.2991 | -0.0196 | -0.0241 | 0.5453 | 0.0044 | -39.9232 | -35.9976 | -1.8646 | -1.8690 |
| 2032.5133 | 3.12 | 1200 | 2464.4653 | -0.0199 | -0.0243 | 0.5598 | 0.0044 | -39.9447 | -36.0201 | -1.8618 | -1.8662 |
| 1967.852 | 3.38 | 1300 | 2461.2036 | -0.0197 | -0.0245 | 0.5540 | 0.0048 | -39.9632 | -36.0051 | -1.8608 | -1.8652 |
| 2084.0914 | 3.64 | 1400 | 2462.6121 | -0.0199 | -0.0245 | 0.5598 | 0.0046 | -39.9675 | -36.0243 | -1.8613 | -1.8656 |
| 2032.7156 | 3.9 | 1500 | 2463.1057 | -0.0198 | -0.0244 | 0.5656 | 0.0046 | -39.9582 | -36.0192 | -1.8605 | -1.8648 |
### Framework versions
- PEFT 0.8.2
- Transformers 4.37.2
- Pytorch 2.1.2+cu121
- Datasets 2.17.0
- Tokenizers 0.15.1