collapse_gemma-2-9b_hs2_accumulate_iter6_sftsd2

This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.9569
Num Input Tokens Seen: 28770280

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 4
eval_batch_size: 16
seed: 2
gradient_accumulation_steps: 32
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.2335	0
1.4397	0.0088	5	1.2035	252296
1.3086	0.0175	10	1.0915	506500
1.1494	0.0263	15	1.0445	760592
1.0067	0.0351	20	1.0258	1004332
0.7695	0.0438	25	1.0296	1254744
0.5357	0.0526	30	1.0276	1508608
0.3396	0.0614	35	1.0433	1755340
0.3616	0.0701	40	1.0337	2011180
0.3358	0.0789	45	1.0260	2258492
0.2944	0.0877	50	1.0259	2516564
0.2253	0.0964	55	1.0194	2766940
0.3132	0.1052	60	1.0127	3021564
0.3384	0.1140	65	1.0111	3277484
0.255	0.1227	70	1.0058	3527464
0.2462	0.1315	75	1.0030	3776276
0.3055	0.1403	80	0.9994	4030452
0.2831	0.1490	85	0.9933	4273320
0.2466	0.1578	90	0.9936	4525792
0.2042	0.1665	95	0.9905	4776548
0.2649	0.1753	100	0.9880	5031264
0.1964	0.1841	105	0.9880	5281260
0.2198	0.1928	110	0.9848	5527728
0.2088	0.2016	115	0.9826	5785504
0.2486	0.2104	120	0.9819	6039476
0.3276	0.2191	125	0.9812	6296036
0.2672	0.2279	130	0.9793	6551664
0.2834	0.2367	135	0.9796	6805564
0.2273	0.2454	140	0.9765	7060732
0.2008	0.2542	145	0.9756	7309668
0.2399	0.2630	150	0.9746	7567368
0.2562	0.2717	155	0.9754	7825264
0.1821	0.2805	160	0.9747	8074164
0.2303	0.2893	165	0.9745	8321340
0.1777	0.2980	170	0.9745	8577896
0.2103	0.3068	175	0.9737	8822656
0.2213	0.3156	180	0.9733	9070192
0.2599	0.3243	185	0.9723	9324820
0.1466	0.3331	190	0.9734	9576412
0.2797	0.3419	195	0.9722	9818756
0.1883	0.3506	200	0.9709	10071484
0.2503	0.3594	205	0.9747	10320876
0.1916	0.3682	210	0.9726	10571940
0.2219	0.3769	215	0.9713	10824124
0.1512	0.3857	220	0.9710	11073656
0.2753	0.3945	225	0.9711	11328296
0.2557	0.4032	230	0.9699	11579460
0.2299	0.4120	235	0.9688	11830412
0.2205	0.4208	240	0.9680	12080820
0.1929	0.4295	245	0.9689	12333420
0.2273	0.4383	250	0.9678	12583240
0.2647	0.4470	255	0.9680	12834684
0.2449	0.4558	260	0.9694	13089408
0.2749	0.4646	265	0.9651	13340368
0.2647	0.4733	270	0.9642	13597720
0.234	0.4821	275	0.9656	13851564
0.2691	0.4909	280	0.9660	14104972
0.3277	0.4996	285	0.9658	14352908
0.2189	0.5084	290	0.9645	14605176
0.1958	0.5172	295	0.9635	14851964
0.2484	0.5259	300	0.9657	15106352
0.2042	0.5347	305	0.9664	15362776
0.2009	0.5435	310	0.9649	15612868
0.2497	0.5522	315	0.9633	15864520
0.2446	0.5610	320	0.9637	16117664
0.1779	0.5698	325	0.9640	16371944
0.2356	0.5785	330	0.9656	16623560
0.2123	0.5873	335	0.9627	16872720
0.2159	0.5961	340	0.9623	17124308
0.2092	0.6048	345	0.9625	17372836
0.2281	0.6136	350	0.9645	17628728
0.2634	0.6224	355	0.9659	17879612
0.2312	0.6311	360	0.9631	18135028
0.2888	0.6399	365	0.9607	18390692
0.2695	0.6487	370	0.9595	18648440
0.1614	0.6574	375	0.9628	18901900
0.2464	0.6662	380	0.9650	19156540
0.277	0.6750	385	0.9602	19410216
0.1922	0.6837	390	0.9606	19666028
0.1204	0.6925	395	0.9616	19925752
0.1864	0.7013	400	0.9615	20172712
0.1827	0.7100	405	0.9629	20430744
0.2452	0.7188	410	0.9617	20684588
0.1543	0.7276	415	0.9593	20937816
0.1891	0.7363	420	0.9594	21188956
0.2248	0.7451	425	0.9609	21450228
0.2304	0.7538	430	0.9638	21695608
0.1279	0.7626	435	0.9627	21947316
0.1945	0.7714	440	0.9588	22199944
0.3	0.7801	445	0.9581	22457408
0.2061	0.7889	450	0.9576	22707864
0.1922	0.7977	455	0.9582	22959288
0.2542	0.8064	460	0.9590	23208268
0.2286	0.8152	465	0.9589	23457504
0.2163	0.8240	470	0.9549	23708000
0.2045	0.8327	475	0.9545	23965028
0.1752	0.8415	480	0.9557	24210004
0.1393	0.8503	485	0.9574	24465820
0.173	0.8590	490	0.9568	24715180
0.216	0.8678	495	0.9551	24970732
0.2171	0.8766	500	0.9550	25226344
0.2616	0.8853	505	0.9574	25478572
0.1634	0.8941	510	0.9542	25734644
0.2093	0.9029	515	0.9542	25984972
0.1975	0.9116	520	0.9552	26237876
0.2017	0.9204	525	0.9557	26490584
0.2441	0.9292	530	0.9551	26744188
0.2348	0.9379	535	0.9551	26994276
0.1623	0.9467	540	0.9546	27245784
0.2043	0.9555	545	0.9559	27498260
0.2488	0.9642	550	0.9581	27757760
0.2076	0.9730	555	0.9563	28012680
0.2058	0.9818	560	0.9562	28269220
0.1892	0.9905	565	0.9563	28519236
0.2218	0.9993	570	0.9569	28770280

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-9b_hs2_accumulate_iter6_sftsd2

collapse_gemma-2-9b_hs2_accumulate_iter6_sftsd2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-9b_hs2_accumulate_iter6_sftsd2

Evaluation results