Llama-3.1-8B-Instruct-SAA-800
This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct on the bct_non_cot_dpo_800 dataset. It achieves the following results on the evaluation set:
- Loss: 0.1529
- Rewards/chosen: -0.0121
- Rewards/rejected: -0.0692
- Rewards/accuracies: 0.8125
- Rewards/margins: 0.0571
- Logps/rejected: -0.6922
- Logps/chosen: -0.1209
- Logits/rejected: -0.3690
- Logits/chosen: -0.3184
- Sft Loss: 0.0162
- Odds Ratio Loss: 1.3676
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 10.0
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Sft Loss | Odds Ratio Loss |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1.4945 | 1.1111 | 50 | 1.2763 | -0.1228 | -0.1709 | 0.7875 | 0.0481 | -1.7095 | -1.2281 | -0.4548 | -0.3794 | 0.1502 | 11.2609 |
0.2666 | 2.2222 | 100 | 0.2491 | -0.0212 | -0.0729 | 0.8250 | 0.0517 | -0.7288 | -0.2119 | -0.4319 | -0.3666 | 0.0242 | 2.2484 |
0.1014 | 3.3333 | 150 | 0.1632 | -0.0129 | -0.0606 | 0.8125 | 0.0477 | -0.6058 | -0.1292 | -0.3820 | -0.3284 | 0.0168 | 1.4635 |
0.1429 | 4.4444 | 200 | 0.1534 | -0.0121 | -0.0584 | 0.8125 | 0.0463 | -0.5841 | -0.1211 | -0.3818 | -0.3298 | 0.0158 | 1.3752 |
0.1007 | 5.5556 | 250 | 0.1530 | -0.0121 | -0.0641 | 0.8125 | 0.0520 | -0.6407 | -0.1206 | -0.3740 | -0.3235 | 0.0159 | 1.3704 |
0.1385 | 6.6667 | 300 | 0.1534 | -0.0122 | -0.0688 | 0.8000 | 0.0566 | -0.6881 | -0.1217 | -0.3725 | -0.3214 | 0.0161 | 1.3729 |
0.0918 | 7.7778 | 350 | 0.1537 | -0.0122 | -0.0689 | 0.8125 | 0.0567 | -0.6889 | -0.1217 | -0.3698 | -0.3191 | 0.0162 | 1.3742 |
0.0752 | 8.8889 | 400 | 0.1534 | -0.0121 | -0.0690 | 0.8000 | 0.0568 | -0.6896 | -0.1213 | -0.3706 | -0.3195 | 0.0162 | 1.3723 |
0.1052 | 10.0 | 450 | 0.1529 | -0.0121 | -0.0692 | 0.8125 | 0.0571 | -0.6922 | -0.1209 | -0.3690 | -0.3184 | 0.0162 | 1.3676 |
Framework versions
- PEFT 0.12.0
- Transformers 4.45.2
- Pytorch 2.3.0
- Datasets 2.19.0
- Tokenizers 0.20.0
- Downloads last month
- 9
Model tree for chchen/Llama-3.1-8B-Instruct-SAA-800
Base model
meta-llama/Llama-3.1-8B
Finetuned
meta-llama/Llama-3.1-8B-Instruct