collapse_gemma-2-27b_hs2_accumulate_iter6_sftsd1

This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.9431
Num Input Tokens Seen: 24113104

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 4
eval_batch_size: 16
seed: 1
gradient_accumulation_steps: 32
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.1282	0
3.2254	0.0108	5	1.0961	268092
2.9876	0.0215	10	1.0148	525364
3.0862	0.0323	15	1.0007	783936
2.8353	0.0431	20	0.9916	1042564
2.8331	0.0539	25	0.9929	1303404
2.8261	0.0646	30	0.9939	1563492
2.3827	0.0754	35	1.0007	1820252
2.4513	0.0862	40	0.9970	2078400
2.487	0.0970	45	0.9878	2336672
2.4506	0.1077	50	0.9944	2596612
2.4163	0.1185	55	0.9918	2850740
1.9844	0.1293	60	0.9928	3111820
2.1404	0.1400	65	0.9893	3372896
2.1749	0.1508	70	0.9831	3630380
1.9777	0.1616	75	0.9831	3893304
1.8699	0.1724	80	0.9804	4157048
1.8005	0.1831	85	0.9768	4414040
1.9942	0.1939	90	0.9743	4679364
1.6692	0.2047	95	0.9746	4942204
2.025	0.2154	100	0.9723	5202156
1.7424	0.2262	105	0.9720	5462920
1.7119	0.2370	110	0.9691	5722880
1.6313	0.2478	115	0.9701	5983564
1.8441	0.2585	120	0.9693	6239988
1.8205	0.2693	125	0.9658	6503448
1.7461	0.2801	130	0.9661	6765768
1.9212	0.2909	135	0.9648	7026576
1.7037	0.3016	140	0.9630	7278368
1.8345	0.3124	145	0.9629	7543060
1.6023	0.3232	150	0.9593	7798308
1.5075	0.3339	155	0.9614	8064352
1.7006	0.3447	160	0.9565	8324852
1.6318	0.3555	165	0.9573	8577584
1.7273	0.3663	170	0.9564	8840932
1.5702	0.3770	175	0.9559	9102784
1.4506	0.3878	180	0.9566	9361972
1.7453	0.3986	185	0.9535	9617376
1.6234	0.4093	190	0.9561	9873880
1.7928	0.4201	195	0.9554	10136916
1.7004	0.4309	200	0.9518	10397756
1.4518	0.4417	205	0.9541	10650948
1.4489	0.4524	210	0.9528	10911356
1.6097	0.4632	215	0.9547	11164796
1.7534	0.4740	220	0.9519	11425528
1.7858	0.4848	225	0.9500	11683928
1.7461	0.4955	230	0.9497	11934068
1.5409	0.5063	235	0.9506	12195672
1.5174	0.5171	240	0.9503	12449748
1.667	0.5278	245	0.9486	12711732
1.7769	0.5386	250	0.9474	12969368
1.6375	0.5494	255	0.9475	13229192
1.6864	0.5602	260	0.9481	13492212
1.5925	0.5709	265	0.9483	13748524
1.5207	0.5817	270	0.9468	14010960
1.8265	0.5925	275	0.9454	14273300
1.8	0.6032	280	0.9454	14533716
1.6235	0.6140	285	0.9487	14794608
1.4906	0.6248	290	0.9464	15062712
1.415	0.6356	295	0.9462	15320116
1.6232	0.6463	300	0.9459	15576780
1.6258	0.6571	305	0.9470	15843180
1.4792	0.6679	310	0.9449	16103364
1.4777	0.6787	315	0.9449	16362160
1.6598	0.6894	320	0.9445	16630092
1.564	0.7002	325	0.9441	16893764
1.2829	0.7110	330	0.9448	17150196
1.5467	0.7217	335	0.9439	17413432
1.2518	0.7325	340	0.9450	17670408
1.5517	0.7433	345	0.9462	17931036
1.4484	0.7541	350	0.9444	18192332
1.3657	0.7648	355	0.9439	18454740
1.5337	0.7756	360	0.9456	18720156
1.5494	0.7864	365	0.9460	18984560
1.3332	0.7971	370	0.9442	19249668
1.4732	0.8079	375	0.9424	19505204
1.4731	0.8187	380	0.9433	19759544
1.3394	0.8295	385	0.9462	20018204
1.501	0.8402	390	0.9437	20276620
1.38	0.8510	395	0.9437	20532420
1.4448	0.8618	400	0.9439	20791532
1.5206	0.8726	405	0.9446	21047984
1.4492	0.8833	410	0.9440	21305632
1.4171	0.8941	415	0.9431	21568260
1.5316	0.9049	420	0.9447	21824664
1.4805	0.9156	425	0.9425	22084792
1.7193	0.9264	430	0.9407	22342320
1.7271	0.9372	435	0.9431	22600956
1.6102	0.9480	440	0.9432	22861708
1.6152	0.9587	445	0.9429	23118340
1.362	0.9695	450	0.9406	23379296
1.51	0.9803	455	0.9446	23638084
1.3728	0.9910	460	0.9421	23906316

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-27b_hs2_accumulate_iter6_sftsd1

collapse_gemma-2-27b_hs2_accumulate_iter6_sftsd1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-27b_hs2_accumulate_iter6_sftsd1

Evaluation results