Llama-3.1-8B-Instruct-SAA-900

This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct on the bct_non_cot_dpo_900 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1515
  • Rewards/chosen: -0.0108
  • Rewards/rejected: -0.0582
  • Rewards/accuracies: 0.8222
  • Rewards/margins: 0.0474
  • Logps/rejected: -0.5819
  • Logps/chosen: -0.1084
  • Logits/rejected: -0.4031
  • Logits/chosen: -0.3480
  • Sft Loss: 0.0132
  • Odds Ratio Loss: 1.3828

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen Sft Loss Odds Ratio Loss
1.5773 0.9877 50 1.3696 -0.1315 -0.1754 0.7667 0.0440 -1.7544 -1.3147 -0.4663 -0.4034 0.1831 11.8657
0.2518 1.9753 100 0.2349 -0.0190 -0.0732 0.8111 0.0542 -0.7321 -0.1898 -0.4483 -0.3781 0.0216 2.1323
0.1304 2.9630 150 0.1530 -0.0109 -0.0612 0.8111 0.0502 -0.6117 -0.1094 -0.4032 -0.3454 0.0131 1.3988
0.1129 3.9506 200 0.1515 -0.0108 -0.0582 0.8222 0.0474 -0.5819 -0.1084 -0.4031 -0.3480 0.0132 1.3828
0.1194 4.9383 250 0.1522 -0.0109 -0.0642 0.8222 0.0533 -0.6417 -0.1088 -0.3982 -0.3417 0.0133 1.3891
0.0898 5.9259 300 0.1535 -0.0110 -0.0684 0.8111 0.0574 -0.6839 -0.1101 -0.3960 -0.3402 0.0136 1.3989
0.0928 6.9136 350 0.1572 -0.0113 -0.0679 0.7889 0.0567 -0.6794 -0.1125 -0.3949 -0.3394 0.0140 1.4318
0.0855 7.9012 400 0.1578 -0.0112 -0.0722 0.8000 0.0609 -0.7215 -0.1125 -0.3935 -0.3375 0.0138 1.4394
0.0985 8.8889 450 0.1574 -0.0112 -0.0720 0.8000 0.0608 -0.7205 -0.1122 -0.3934 -0.3372 0.0138 1.4358
0.0859 9.8765 500 0.1582 -0.0113 -0.0724 0.7889 0.0611 -0.7239 -0.1129 -0.3937 -0.3373 0.0140 1.4419

Framework versions

  • PEFT 0.12.0
  • Transformers 4.45.2
  • Pytorch 2.3.0
  • Datasets 2.19.0
  • Tokenizers 0.20.0
Downloads last month
8
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for chchen/Llama-3.1-8B-Instruct-SAA-900

Adapter
(627)
this model