Llama-3.1-8B-Instruct-SAA-400

This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct on the bct_non_cot_dpo_400 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1181
  • Rewards/chosen: -0.0081
  • Rewards/rejected: -0.0551
  • Rewards/accuracies: 0.8000
  • Rewards/margins: 0.0470
  • Logps/rejected: -0.5510
  • Logps/chosen: -0.0815
  • Logits/rejected: -0.3587
  • Logits/chosen: -0.3145
  • Sft Loss: 0.0122
  • Odds Ratio Loss: 1.0589

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen Sft Loss Odds Ratio Loss
1.0959 2.2222 50 0.8812 -0.0833 -0.1278 0.8000 0.0444 -1.2775 -0.8334 -0.4147 -0.3542 0.0993 7.8184
0.2448 4.4444 100 0.1774 -0.0141 -0.0615 0.8000 0.0474 -0.6147 -0.1406 -0.3962 -0.3451 0.0187 1.5871
0.1229 6.6667 150 0.1202 -0.0083 -0.0555 0.7750 0.0472 -0.5554 -0.0834 -0.3636 -0.3187 0.0124 1.0785
0.1265 8.8889 200 0.1181 -0.0081 -0.0551 0.8000 0.0470 -0.5510 -0.0815 -0.3587 -0.3145 0.0122 1.0589

Framework versions

  • PEFT 0.12.0
  • Transformers 4.45.2
  • Pytorch 2.3.0
  • Datasets 2.19.0
  • Tokenizers 0.20.0
Downloads last month
7
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model’s pipeline type.

Model tree for chchen/Llama-3.1-8B-Instruct-SAA-400

Adapter
(659)
this model