Llama0-3-8b-ultra-p-0.05-lr1e-6

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.5019
Rewards/chosen: -1.5493
Rewards/rejected: -2.9665
Rewards/accuracies: 0.7734
Rewards/margins: 1.4171
Logps/rejected: -561.3090
Logps/chosen: -411.4854
Logits/rejected: 0.1251
Logits/chosen: 0.1589

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-06
train_batch_size: 2
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 8
total_train_batch_size: 128
total_eval_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 2.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.5872	0.2060	100	0.5793	-0.4159	-0.7886	0.6797	0.3727	-343.5205	-298.1459	0.1012	0.0323
0.5466	0.4119	200	0.5376	-0.8009	-1.5283	0.7109	0.7274	-417.4928	-336.6483	0.4182	0.2978
0.5219	0.6179	300	0.5154	-0.7308	-1.5181	0.7422	0.7873	-416.4722	-329.6319	0.3827	0.2908
0.5127	0.8239	400	0.5057	-0.8269	-1.7581	0.7656	0.9312	-440.4687	-339.2437	0.4026	0.3229
0.4349	1.0299	500	0.5138	-1.4782	-2.8787	0.75	1.4006	-552.5379	-404.3730	-0.0637	-0.0172
0.3498	1.2358	600	0.5116	-1.7234	-3.2236	0.7812	1.5002	-587.0215	-428.8901	0.1265	0.1477
0.3542	1.4418	700	0.5009	-1.5883	-3.0302	0.7656	1.4418	-567.6822	-415.3892	0.0483	0.0990
0.3613	1.6478	800	0.4959	-1.4506	-2.8100	0.7578	1.3594	-545.6597	-401.6139	0.1564	0.1835
0.3586	1.8538	900	0.5056	-1.6477	-3.1194	0.75	1.4717	-576.6020	-421.3250	0.1133	0.1546

Framework versions

Transformers 4.45.1
Pytorch 2.4.1+cu121
Datasets 3.0.0
Tokenizers 0.20.0

tongliuphysics
/

Llama0-3-8b-ultra-p-0.05-lr1e-6

Llama0-3-8b-ultra-p-0.05-lr1e-6

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for tongliuphysics/Llama0-3-8b-ultra-p-0.05-lr1e-6

Evaluation results