collapse_gemma-2-9b_hs2_accumulate_iter6_sftsd1

This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.9537
Num Input Tokens Seen: 28613096

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 4
eval_batch_size: 16
seed: 1
gradient_accumulation_steps: 32
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.2335	0
1.3359	0.0088	5	1.2046	248636
1.1839	0.0175	10	1.0988	491672
1.0402	0.0263	15	1.0538	747280
0.8977	0.0351	20	1.0290	998768
0.6695	0.0438	25	1.0319	1250796
0.5374	0.0526	30	1.0345	1499784
0.4154	0.0614	35	1.0360	1748764
0.3295	0.0701	40	1.0337	2005896
0.2901	0.0789	45	1.0210	2250148
0.3106	0.0877	50	1.0183	2501568
0.2671	0.0965	55	1.0131	2757456
0.328	0.1052	60	1.0060	3015288
0.2448	0.1140	65	1.0001	3271092
0.2246	0.1228	70	0.9964	3521248
0.2362	0.1315	75	0.9959	3779680
0.2733	0.1403	80	0.9912	4023972
0.2402	0.1491	85	0.9902	4268036
0.2512	0.1578	90	0.9892	4519392
0.2321	0.1666	95	0.9864	4778620
0.2273	0.1754	100	0.9851	5025692
0.2323	0.1841	105	0.9838	5280468
0.3053	0.1929	110	0.9803	5530924
0.2245	0.2017	115	0.9804	5785864
0.2883	0.2104	120	0.9804	6036860
0.2026	0.2192	125	0.9785	6287784
0.2469	0.2280	130	0.9777	6545160
0.2019	0.2368	135	0.9773	6792968
0.2678	0.2455	140	0.9750	7040816
0.218	0.2543	145	0.9757	7293752
0.2226	0.2631	150	0.9774	7538784
0.2404	0.2718	155	0.9744	7790920
0.232	0.2806	160	0.9735	8039788
0.2165	0.2894	165	0.9744	8288388
0.2242	0.2981	170	0.9733	8538288
0.2149	0.3069	175	0.9725	8791120
0.1776	0.3157	180	0.9731	9036600
0.1956	0.3244	185	0.9720	9284368
0.2323	0.3332	190	0.9710	9530516
0.2396	0.3420	195	0.9705	9780556
0.1707	0.3507	200	0.9700	10030712
0.1939	0.3595	205	0.9692	10276572
0.2423	0.3683	210	0.9689	10527840
0.2288	0.3770	215	0.9701	10768656
0.1921	0.3858	220	0.9686	11021400
0.266	0.3946	225	0.9682	11275360
0.2229	0.4034	230	0.9660	11523228
0.2259	0.4121	235	0.9684	11777976
0.1639	0.4209	240	0.9676	12036000
0.221	0.4297	245	0.9663	12285220
0.3291	0.4384	250	0.9663	12536112
0.2452	0.4472	255	0.9657	12795356
0.1445	0.4560	260	0.9648	13047616
0.2716	0.4647	265	0.9629	13299196
0.1989	0.4735	270	0.9626	13542628
0.2011	0.4823	275	0.9654	13784996
0.1976	0.4910	280	0.9659	14044796
0.2436	0.4998	285	0.9621	14296308
0.1861	0.5086	290	0.9625	14547836
0.2246	0.5173	295	0.9645	14795728
0.2134	0.5261	300	0.9622	15042044
0.2016	0.5349	305	0.9615	15292508
0.191	0.5437	310	0.9627	15549036
0.1852	0.5524	315	0.9612	15800828
0.2197	0.5612	320	0.9603	16057208
0.1979	0.5700	325	0.9613	16313496
0.2359	0.5787	330	0.9612	16568676
0.1795	0.5875	335	0.9593	16821224
0.1896	0.5963	340	0.9601	17076824
0.2183	0.6050	345	0.9606	17329076
0.2005	0.6138	350	0.9587	17582136
0.2036	0.6226	355	0.9581	17829840
0.2329	0.6313	360	0.9602	18083936
0.1998	0.6401	365	0.9586	18336516
0.2645	0.6489	370	0.9577	18585836
0.1798	0.6576	375	0.9593	18835272
0.2039	0.6664	380	0.9580	19092844
0.2022	0.6752	385	0.9582	19347236
0.1866	0.6839	390	0.9589	19595964
0.2512	0.6927	395	0.9596	19845060
0.1757	0.7015	400	0.9581	20098528
0.1955	0.7103	405	0.9566	20351356
0.2391	0.7190	410	0.9565	20603916
0.2249	0.7278	415	0.9565	20859824
0.2613	0.7366	420	0.9563	21110008
0.2307	0.7453	425	0.9552	21361900
0.2076	0.7541	430	0.9565	21605480
0.1599	0.7629	435	0.9566	21853968
0.2783	0.7716	440	0.9552	22102344
0.2174	0.7804	445	0.9546	22348724
0.1421	0.7892	450	0.9552	22603820
0.2744	0.7979	455	0.9554	22851568
0.1836	0.8067	460	0.9555	23098412
0.1509	0.8155	465	0.9576	23345100
0.1987	0.8242	470	0.9560	23593736
0.223	0.8330	475	0.9559	23837832
0.1652	0.8418	480	0.9568	24091040
0.2086	0.8506	485	0.9562	24348968
0.1957	0.8593	490	0.9564	24598404
0.2455	0.8681	495	0.9556	24852524
0.1507	0.8769	500	0.9561	25099900
0.2063	0.8856	505	0.9572	25345184
0.2191	0.8944	510	0.9555	25597644
0.2405	0.9032	515	0.9561	25848912
0.2649	0.9119	520	0.9575	26096932
0.1879	0.9207	525	0.9549	26345864
0.2198	0.9295	530	0.9536	26597760
0.2613	0.9382	535	0.9541	26851052
0.2039	0.9470	540	0.9539	27105996
0.2108	0.9558	545	0.9553	27355996
0.1767	0.9645	550	0.9564	27605896
0.2146	0.9733	555	0.9564	27859848
0.2219	0.9821	560	0.9561	28118420
0.1854	0.9908	565	0.9530	28363872
0.1964	0.9996	570	0.9537	28613096

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-9b_hs2_accumulate_iter6_sftsd1

collapse_gemma-2-9b_hs2_accumulate_iter6_sftsd1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-9b_hs2_accumulate_iter6_sftsd1

Evaluation results