Llama-31-8B_task-2_60-samples_config-4

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct on the GaetanMichelet/chat-60_ft_task-2 dataset. It achieves the following results on the evaluation set:

Loss: 0.7166

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 16
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 150

Training results

Training Loss	Epoch	Step	Validation Loss
1.0749	0.6957	2	1.0966
1.0739	1.7391	5	1.0942
1.0883	2.7826	8	1.0905
1.0572	3.8261	11	1.0844
1.0814	4.8696	14	1.0741
1.0423	5.9130	17	1.0622
1.0626	6.9565	20	1.0462
1.0118	8.0	23	1.0248
1.0176	8.6957	25	1.0099
0.9728	9.7391	28	0.9822
0.9567	10.7826	31	0.9527
0.9202	11.8261	34	0.9259
0.9099	12.8696	37	0.9015
0.8806	13.9130	40	0.8828
0.7975	14.9565	43	0.8661
0.8572	16.0	46	0.8533
0.8342	16.6957	48	0.8447
0.8242	17.7391	51	0.8331
0.7954	18.7826	54	0.8223
0.8235	19.8261	57	0.8122
0.7896	20.8696	60	0.8017
0.7775	21.9130	63	0.7933
0.7315	22.9565	66	0.7862
0.7702	24.0	69	0.7800
0.7262	24.6957	71	0.7756
0.7683	25.7391	74	0.7715
0.7043	26.7826	77	0.7656
0.7314	27.8261	80	0.7621
0.7093	28.8696	83	0.7586
0.7047	29.9130	86	0.7542
0.707	30.9565	89	0.7506
0.7128	32.0	92	0.7475
0.676	32.6957	94	0.7451
0.7113	33.7391	97	0.7420
0.6733	34.7826	100	0.7396
0.698	35.8261	103	0.7370
0.6868	36.8696	106	0.7339
0.6633	37.9130	109	0.7310
0.675	38.9565	112	0.7296
0.6563	40.0	115	0.7270
0.64	40.6957	117	0.7257
0.6314	41.7391	120	0.7242
0.619	42.7826	123	0.7225
0.6256	43.8261	126	0.7211
0.634	44.8696	129	0.7198
0.5984	45.9130	132	0.7185
0.636	46.9565	135	0.7176
0.6084	48.0	138	0.7173
0.6068	48.6957	140	0.7168
0.5982	49.7391	143	0.7166
0.6024	50.7826	146	0.7171
0.5876	51.8261	149	0.7170
0.5852	52.8696	152	0.7169
0.5803	53.9130	155	0.7175
0.5794	54.9565	158	0.7172
0.5699	56.0	161	0.7188
0.5722	56.6957	163	0.7192

Framework versions

PEFT 0.12.0
Transformers 4.44.0
Pytorch 2.1.2+cu121
Datasets 2.20.0
Tokenizers 0.19.1

GaetanMichelet
/

Llama-31-8B_task-2_60-samples_config-4

Llama-31-8B_task-2_60-samples_config-4

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for GaetanMichelet/Llama-31-8B_task-2_60-samples_config-4

Collection including GaetanMichelet/Llama-31-8B_task-2_60-samples_config-4

Configurations choice

Evaluation results