llm3br256

This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on the gommt dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 64
eval_batch_size: 64
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 25.0

Training Loss	Epoch	Step	Validation Loss
0.232	0.1613	25	0.2193
0.1524	0.3226	50	0.1507
0.115	0.4839	75	0.1165
0.0875	0.6452	100	0.1004
0.092	0.8065	125	0.0909
0.1077	0.9677	150	0.0900
0.0688	1.1290	175	0.0778
0.0682	1.2903	200	0.0723
0.0621	1.4516	225	0.0668
0.0668	1.6129	250	0.0646
0.0672	1.7742	275	0.0587
0.0484	1.9355	300	0.0544
0.0468	2.0968	325	0.0516
0.0438	2.2581	350	0.0503
0.0364	2.4194	375	0.0493
0.0365	2.5806	400	0.0460
0.0469	2.7419	425	0.0432
0.027	2.9032	450	0.0379
0.026	3.0645	475	0.0356
0.0223	3.2258	500	0.0357
0.0228	3.3871	525	0.0352
0.0199	3.5484	550	0.0336
0.0227	3.7097	575	0.0308
0.0207	3.8710	600	0.0292
0.0125	4.0323	625	0.0304
0.0146	4.1935	650	0.0279
0.0126	4.3548	675	0.0283
0.0141	4.5161	700	0.0270
0.0133	4.6774	725	0.0254
0.0098	4.8387	750	0.0250
0.0093	5.0	775	0.0234
0.0073	5.1613	800	0.0247
0.0087	5.3226	825	0.0254
0.0102	5.4839	850	0.0242
0.0077	5.6452	875	0.0230
0.0085	5.8065	900	0.0230
0.0069	5.9677	925	0.0213
0.0056	6.1290	950	0.0226
0.0063	6.2903	975	0.0224
0.0055	6.4516	1000	0.0227
0.0067	6.6129	1025	0.0229
0.0052	6.7742	1050	0.0224
0.008	6.9355	1075	0.0219
0.0053	7.0968	1100	0.0227
0.0049	7.2581	1125	0.0220
0.0059	7.4194	1150	0.0218
0.0045	7.5806	1175	0.0215
0.0058	7.7419	1200	0.0206
0.0047	7.9032	1225	0.0207
0.0043	8.0645	1250	0.0223
0.0046	8.2258	1275	0.0218
0.0036	8.3871	1300	0.0225
0.0034	8.5484	1325	0.0216