Llama-3.1-8B-Instruct-KTO-100

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct on the bct_non_cot_kto_100 dataset. It achieves the following results on the evaluation set:

Loss: 0.4997
Rewards/chosen: 0.0050
Logps/chosen: -17.0744
Logits/chosen: -5053702.8571
Rewards/rejected: 0.0078
Logps/rejected: -23.8299
Logits/rejected: -7957526.6667
Rewards/margins: -0.0028
Kl: 0.0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 16
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Logps/chosen	Logits/chosen	Rewards/rejected	Logps/rejected	Logits/rejected	Rewards/margins
0.4944	4.4444	50	0.5018	-0.0014	-17.1389	-5154306.2857	0.0189	-23.7185	-7920785.3333	-0.0204	0.0758
0.4809	8.8889	100	0.4997	0.0050	-17.0744	-5053702.8571	0.0078	-23.8299	-7957526.6667	-0.0028	0.0

Framework versions

PEFT 0.12.0
Transformers 4.46.1
Pytorch 2.5.1+cu124
Datasets 3.1.0
Tokenizers 0.20.3

chchen
/

Llama-3.1-8B-Instruct-KTO-100

Llama-3.1-8B-Instruct-KTO-100

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for chchen/Llama-3.1-8B-Instruct-KTO-100

Evaluation results