collapse_gemma-2-27b_hs2_accumulate_iter6_sftsd2

This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.9477
Num Input Tokens Seen: 24547368

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 4
eval_batch_size: 16
seed: 2
gradient_accumulation_steps: 32
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.1282	0
3.1929	0.0104	5	1.1054	258672
3.0522	0.0207	10	1.0218	516688
2.8837	0.0311	15	0.9980	769464
2.8939	0.0415	20	0.9908	1023524
2.5952	0.0518	25	0.9880	1278684
2.8256	0.0622	30	0.9921	1525544
2.4783	0.0726	35	0.9969	1772664
2.4713	0.0829	40	1.0015	2028124
2.4233	0.0933	45	1.0009	2290688
2.1576	0.1036	50	1.0000	2541904
2.1476	0.1140	55	0.9984	2802668
2.0639	0.1244	60	0.9936	3061672
2.0863	0.1347	65	0.9958	3321424
1.7792	0.1451	70	0.9949	3567924
1.7329	0.1555	75	0.9895	3817752
1.9016	0.1658	80	0.9863	4069396
1.7455	0.1762	85	0.9823	4326708
1.4548	0.1866	90	0.9835	4578592
1.9925	0.1969	95	0.9793	4831136
1.5158	0.2073	100	0.9795	5084284
1.3108	0.2177	105	0.9788	5341196
1.6902	0.2280	110	0.9749	5606060
1.6011	0.2384	115	0.9767	5866064
1.547	0.2488	120	0.9708	6123360
1.3591	0.2591	125	0.9733	6381784
1.5019	0.2695	130	0.9718	6639256
1.5715	0.2798	135	0.9696	6895060
1.4177	0.2902	140	0.9694	7145840
1.467	0.3006	145	0.9663	7397572
1.5423	0.3109	150	0.9655	7653164
1.2753	0.3213	155	0.9656	7911008
1.5971	0.3317	160	0.9644	8163972
1.5416	0.3420	165	0.9648	8422184
1.6416	0.3524	170	0.9632	8681656
1.4712	0.3628	175	0.9615	8942564
1.6394	0.3731	180	0.9614	9199204
1.2702	0.3835	185	0.9590	9451368
1.3811	0.3939	190	0.9618	9705944
1.3814	0.4042	195	0.9619	9966356
1.5587	0.4146	200	0.9598	10223764
1.412	0.4250	205	0.9581	10476676
1.4082	0.4353	210	0.9587	10731868
1.517	0.4457	215	0.9578	10983192
1.4438	0.4560	220	0.9596	11241884
1.4262	0.4664	225	0.9566	11495712
1.2175	0.4768	230	0.9568	11746224
1.288	0.4871	235	0.9555	11999420
1.4236	0.4975	240	0.9561	12258032
1.2708	0.5079	245	0.9545	12509532
1.472	0.5182	250	0.9526	12761720
1.4331	0.5286	255	0.9551	13018164
1.2036	0.5390	260	0.9546	13280196
1.2403	0.5493	265	0.9544	13530496
1.244	0.5597	270	0.9533	13777476
1.5044	0.5701	275	0.9516	14036360
1.1313	0.5804	280	0.9534	14292528
1.4051	0.5908	285	0.9541	14540976
1.4291	0.6012	290	0.9500	14791196
1.3338	0.6115	295	0.9516	15045092
1.2997	0.6219	300	0.9536	15296648
1.4427	0.6322	305	0.9524	15542264
1.3716	0.6426	310	0.9491	15797752
1.4086	0.6530	315	0.9516	16051964
1.3173	0.6633	320	0.9495	16309408
1.4131	0.6737	325	0.9497	16562960
1.3696	0.6841	330	0.9478	16816124
1.3399	0.6944	335	0.9485	17068232
1.2074	0.7048	340	0.9484	17322548
1.1584	0.7152	345	0.9507	17575180
1.3924	0.7255	350	0.9476	17835436
1.3037	0.7359	355	0.9495	18084436
1.4192	0.7463	360	0.9470	18333840
1.4104	0.7566	365	0.9486	18585048
1.1994	0.7670	370	0.9462	18841288
1.4342	0.7774	375	0.9468	19098152
1.3477	0.7877	380	0.9457	19351632
1.2131	0.7981	385	0.9480	19608392
1.2247	0.8084	390	0.9453	19865888
1.2134	0.8188	395	0.9437	20121508
1.3281	0.8292	400	0.9453	20377088
1.1996	0.8395	405	0.9454	20635728
1.3079	0.8499	410	0.9455	20889244
1.1988	0.8603	415	0.9470	21145028
1.4831	0.8706	420	0.9469	21400236
1.2966	0.8810	425	0.9438	21661824
1.3788	0.8914	430	0.9466	21919900
1.2225	0.9017	435	0.9447	22173924
1.4329	0.9121	440	0.9468	22423548
1.3712	0.9225	445	0.9440	22680788
1.1879	0.9328	450	0.9441	22933912
1.1186	0.9432	455	0.9457	23183940
1.3402	0.9536	460	0.9500	23433908
1.2629	0.9639	465	0.9462	23687452
1.1852	0.9743	470	0.9449	23941716
1.2072	0.9846	475	0.9451	24191528
1.2218	0.9950	480	0.9482	24444424

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-27b_hs2_accumulate_iter6_sftsd2

collapse_gemma-2-27b_hs2_accumulate_iter6_sftsd2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-27b_hs2_accumulate_iter6_sftsd2

Evaluation results