Llama-3.1-8B-Instruct-SAA-900

This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct on the bct_non_cot_dpo_900 dataset. It achieves the following results on the evaluation set:

Loss: 0.1515
Rewards/chosen: -0.0108
Rewards/rejected: -0.0582
Rewards/accuracies: 0.8222
Rewards/margins: 0.0474
Logps/rejected: -0.5819
Logps/chosen: -0.1084
Logits/rejected: -0.4031
Logits/chosen: -0.3480
Sft Loss: 0.0132
Odds Ratio Loss: 1.3828

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen	Sft Loss	Odds Ratio Loss
1.5773	0.9877	50	1.3696	-0.1315	-0.1754	0.7667	0.0440	-1.7544	-1.3147	-0.4663	-0.4034	0.1831	11.8657
0.2518	1.9753	100	0.2349	-0.0190	-0.0732	0.8111	0.0542	-0.7321	-0.1898	-0.4483	-0.3781	0.0216	2.1323
0.1304	2.9630	150	0.1530	-0.0109	-0.0612	0.8111	0.0502	-0.6117	-0.1094	-0.4032	-0.3454	0.0131	1.3988
0.1129	3.9506	200	0.1515	-0.0108	-0.0582	0.8222	0.0474	-0.5819	-0.1084	-0.4031	-0.3480	0.0132	1.3828
0.1194	4.9383	250	0.1522	-0.0109	-0.0642	0.8222	0.0533	-0.6417	-0.1088	-0.3982	-0.3417	0.0133	1.3891
0.0898	5.9259	300	0.1535	-0.0110	-0.0684	0.8111	0.0574	-0.6839	-0.1101	-0.3960	-0.3402	0.0136	1.3989
0.0928	6.9136	350	0.1572	-0.0113	-0.0679	0.7889	0.0567	-0.6794	-0.1125	-0.3949	-0.3394	0.0140	1.4318
0.0855	7.9012	400	0.1578	-0.0112	-0.0722	0.8000	0.0609	-0.7215	-0.1125	-0.3935	-0.3375	0.0138	1.4394
0.0985	8.8889	450	0.1574	-0.0112	-0.0720	0.8000	0.0608	-0.7205	-0.1122	-0.3934	-0.3372	0.0138	1.4358
0.0859	9.8765	500	0.1582	-0.0113	-0.0724	0.7889	0.0611	-0.7239	-0.1129	-0.3937	-0.3373	0.0140	1.4419

Framework versions

PEFT 0.12.0
Transformers 4.45.2
Pytorch 2.3.0
Datasets 2.19.0
Tokenizers 0.20.0

chchen
/

Llama-3.1-8B-Instruct-SAA-900

Llama-3.1-8B-Instruct-SAA-900

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for chchen/Llama-3.1-8B-Instruct-SAA-900

Evaluation results