Llama0-3-8b-ultra-p-0.075

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.5063
Rewards/chosen: -0.9837
Rewards/rejected: -1.9526
Rewards/accuracies: 0.7344
Rewards/margins: 0.9689
Logps/rejected: -459.9252
Logps/chosen: -354.9268
Logits/rejected: 0.8576
Logits/chosen: 0.7216

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 2
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 8
total_train_batch_size: 128
total_eval_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 2.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.5989	0.2060	100	0.5953	-0.3911	-0.6332	0.6875	0.2421	-327.9836	-295.6613	0.3186	0.2570
0.5722	0.4119	200	0.5672	-0.4880	-0.8924	0.6797	0.4044	-353.9008	-305.3550	0.3285	0.2473
0.5491	0.6179	300	0.5534	-0.5787	-1.1034	0.6797	0.5246	-374.9990	-314.4276	0.5180	0.3959
0.5365	0.8239	400	0.5356	-0.6519	-1.3048	0.7188	0.6529	-395.1465	-321.7464	0.6059	0.4801
0.4994	1.0299	500	0.5203	-0.8521	-1.6829	0.7422	0.8307	-432.9504	-341.7678	0.7006	0.5577
0.4457	1.2358	600	0.5152	-1.1329	-2.1082	0.7031	0.9753	-475.4800	-369.8448	0.8877	0.7498
0.4575	1.4418	700	0.5080	-0.9937	-1.9490	0.7344	0.9553	-459.5659	-355.9217	0.8472	0.7076
0.4565	1.6478	800	0.5054	-1.0354	-2.0196	0.7344	0.9842	-466.6190	-360.0945	0.8950	0.7597
0.4618	1.8538	900	0.5058	-1.0069	-1.9906	0.7344	0.9837	-463.7250	-357.2453	0.8660	0.7293

Framework versions

Transformers 4.45.1
Pytorch 2.4.1+cu121
Datasets 3.0.0
Tokenizers 0.20.0

tongliuphysics
/

Llama0-3-8b-ultra-p-0.075

Llama0-3-8b-ultra-p-0.075

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for tongliuphysics/Llama0-3-8b-ultra-p-0.075

Evaluation results