collapse_gemma-2-27b_hs2_accumulate_iter5_sftsd1

This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.9361
Num Input Tokens Seen: 21319328

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 4
eval_batch_size: 16
seed: 1
gradient_accumulation_steps: 32
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.1282	0
2.9736	0.0120	5	1.0890	257960
2.9201	0.0240	10	1.0094	513580
2.7063	0.0360	15	0.9961	772336
2.7066	0.0479	20	0.9889	1027836
2.5663	0.0599	25	0.9868	1285388
2.586	0.0719	30	0.9896	1541024
2.5497	0.0839	35	0.9909	1796588
2.325	0.0959	40	0.9916	2051248
2.1303	0.1079	45	0.9928	2316512
2.1498	0.1198	50	0.9901	2575448
2.1035	0.1318	55	0.9887	2827576
2.0106	0.1438	60	0.9895	3085924
1.9861	0.1558	65	0.9849	3344592
1.8483	0.1678	70	0.9882	3587496
1.698	0.1798	75	0.9837	3845228
1.5455	0.1917	80	0.9820	4094024
1.7371	0.2037	85	0.9779	4352288
1.6068	0.2157	90	0.9755	4606816
1.6234	0.2277	95	0.9705	4865000
1.6119	0.2397	100	0.9710	5122860
1.4461	0.2517	105	0.9661	5380192
1.5323	0.2637	110	0.9648	5638952
1.48	0.2756	115	0.9644	5895124
1.5077	0.2876	120	0.9632	6150672
1.3105	0.2996	125	0.9605	6404592
1.5438	0.3116	130	0.9604	6667232
1.6025	0.3236	135	0.9587	6919444
1.5647	0.3356	140	0.9575	7171560
1.3177	0.3475	145	0.9598	7427412
1.4743	0.3595	150	0.9563	7690832
1.6544	0.3715	155	0.9547	7949984
1.397	0.3835	160	0.9584	8205800
1.3666	0.3955	165	0.9543	8464028
1.5154	0.4075	170	0.9527	8713484
1.5427	0.4194	175	0.9557	8971692
1.2568	0.4314	180	0.9521	9225284
1.3871	0.4434	185	0.9520	9479360
1.5084	0.4554	190	0.9521	9730040
1.4411	0.4674	195	0.9499	9989888
1.3642	0.4794	200	0.9487	10253880
1.2564	0.4913	205	0.9472	10506892
1.4515	0.5033	210	0.9496	10762052
1.2647	0.5153	215	0.9494	11010792
1.3365	0.5273	220	0.9491	11258360
1.4796	0.5393	225	0.9486	11509984
1.4464	0.5513	230	0.9468	11768156
1.1882	0.5633	235	0.9482	12022340
1.4812	0.5752	240	0.9485	12270644
1.3927	0.5872	245	0.9466	12529864
1.5076	0.5992	250	0.9475	12788428
1.3727	0.6112	255	0.9459	13039508
1.2361	0.6232	260	0.9476	13292956
1.3745	0.6352	265	0.9443	13548132
1.3198	0.6471	270	0.9442	13805636
1.2179	0.6591	275	0.9436	14058880
1.4035	0.6711	280	0.9463	14318400
1.2952	0.6831	285	0.9440	14568908
1.291	0.6951	290	0.9439	14823440
1.4132	0.7071	295	0.9436	15082248
1.5722	0.7190	300	0.9429	15338164
1.2473	0.7310	305	0.9416	15601888
1.2805	0.7430	310	0.9420	15855996
1.1853	0.7550	315	0.9401	16103316
1.4429	0.7670	320	0.9411	16354352
1.0744	0.7790	325	0.9417	16609264
1.2779	0.7910	330	0.9432	16869072
1.4178	0.8029	335	0.9407	17125932
1.3986	0.8149	340	0.9414	17379164
1.1471	0.8269	345	0.9404	17628696
1.1763	0.8389	350	0.9426	17884156
1.2251	0.8509	355	0.9389	18134160
1.2366	0.8629	360	0.9409	18391736
1.3086	0.8748	365	0.9392	18644984
1.2506	0.8868	370	0.9405	18902772
1.355	0.8988	375	0.9384	19165216
1.3424	0.9108	380	0.9400	19415060
1.3585	0.9228	385	0.9390	19668820
1.3487	0.9348	390	0.9425	19922732
1.4113	0.9467	395	0.9402	20187160
1.5089	0.9587	400	0.9377	20438732
1.3723	0.9707	405	0.9376	20699200
1.2797	0.9827	410	0.9422	20957600
1.3996	0.9947	415	0.9367	21217992

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-27b_hs2_accumulate_iter5_sftsd1

collapse_gemma-2-27b_hs2_accumulate_iter5_sftsd1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-27b_hs2_accumulate_iter5_sftsd1

Evaluation results