collapse_gemma-2-27b_hs2_accumulate_iter6_sftsd0

This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.9437
Num Input Tokens Seen: 24429136

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 4
eval_batch_size: 16
seed: 0
gradient_accumulation_steps: 32
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.1282	0
3.2167	0.0105	5	1.0957	260288
2.852	0.0211	10	1.0143	515120
3.1183	0.0316	15	1.0029	771156
3.0936	0.0421	20	0.9878	1024572
2.7326	0.0527	25	0.9873	1286740
2.8651	0.0632	30	0.9935	1544608
2.4543	0.0738	35	0.9912	1797588
2.3316	0.0843	40	0.9939	2059220
2.2969	0.0948	45	0.9962	2316228
2.209	0.1054	50	0.9957	2573616
2.2011	0.1159	55	0.9901	2828304
2.249	0.1264	60	0.9885	3089580
1.9603	0.1370	65	0.9874	3344980
1.8445	0.1475	70	0.9853	3594824
2.0684	0.1580	75	0.9810	3851312
2.126	0.1686	80	0.9800	4107272
1.985	0.1791	85	0.9843	4370976
1.8983	0.1896	90	0.9811	4618784
1.9013	0.2002	95	0.9777	4879132
1.698	0.2107	100	0.9769	5133180
1.8366	0.2213	105	0.9777	5386652
1.6669	0.2318	110	0.9781	5651192
1.6158	0.2423	115	0.9741	5916924
1.7143	0.2529	120	0.9717	6175092
1.8355	0.2634	125	0.9722	6425136
1.5842	0.2739	130	0.9702	6678384
1.6173	0.2845	135	0.9698	6939592
1.6637	0.2950	140	0.9697	7196320
1.7279	0.3055	145	0.9669	7458004
1.7614	0.3161	150	0.9673	7716756
1.5623	0.3266	155	0.9649	7977856
1.4405	0.3372	160	0.9619	8235564
1.6714	0.3477	165	0.9663	8484732
1.7076	0.3582	170	0.9628	8745024
1.6164	0.3688	175	0.9610	8997416
1.7585	0.3793	180	0.9595	9257192
1.4447	0.3898	185	0.9606	9512832
1.5863	0.4004	190	0.9586	9768872
1.5235	0.4109	195	0.9593	10028040
1.5822	0.4214	200	0.9581	10285048
1.5285	0.4320	205	0.9548	10542956
1.5484	0.4425	210	0.9568	10812508
1.4607	0.4530	215	0.9546	11069192
1.4989	0.4636	220	0.9549	11316832
1.5499	0.4741	225	0.9533	11581384
1.3848	0.4847	230	0.9544	11838708
1.3471	0.4952	235	0.9543	12091016
1.2328	0.5057	240	0.9527	12347520
1.3087	0.5163	245	0.9532	12604588
1.3999	0.5268	250	0.9542	12855508
1.5176	0.5373	255	0.9548	13115696
1.3977	0.5479	260	0.9521	13369976
1.14	0.5584	265	0.9528	13629368
1.4824	0.5689	270	0.9539	13889796
1.2656	0.5795	275	0.9525	14149180
1.6385	0.5900	280	0.9504	14410972
1.6261	0.6006	285	0.9521	14667904
1.3793	0.6111	290	0.9497	14930112
1.4541	0.6216	295	0.9507	15189508
1.3924	0.6322	300	0.9490	15444108
1.5557	0.6427	305	0.9510	15706864
1.0083	0.6532	310	0.9509	15961260
1.3866	0.6638	315	0.9494	16222564
1.1968	0.6743	320	0.9510	16479072
1.4406	0.6848	325	0.9499	16744040
1.5105	0.6954	330	0.9499	17008220
1.6523	0.7059	335	0.9473	17259972
1.5404	0.7164	340	0.9504	17518816
1.3327	0.7270	345	0.9472	17775008
1.513	0.7375	350	0.9492	18032752
1.2379	0.7481	355	0.9465	18285632
1.3717	0.7586	360	0.9490	18539636
1.3086	0.7691	365	0.9472	18799412
1.3041	0.7797	370	0.9514	19059540
1.3024	0.7902	375	0.9434	19316932
1.5247	0.8007	380	0.9489	19569092
1.2434	0.8113	385	0.9464	19826932
1.2871	0.8218	390	0.9474	20085368
1.2104	0.8323	395	0.9458	20341548
1.4294	0.8429	400	0.9471	20601432
1.7209	0.8534	405	0.9453	20862684
1.4075	0.8640	410	0.9487	21111520
1.2768	0.8745	415	0.9468	21367704
1.3763	0.8850	420	0.9483	21630016
1.7273	0.8956	425	0.9465	21885988
1.1818	0.9061	430	0.9453	22144724
1.3393	0.9166	435	0.9423	22401468
1.2411	0.9272	440	0.9450	22657412
1.6222	0.9377	445	0.9431	22912156
1.3318	0.9482	450	0.9449	23172860
1.4488	0.9588	455	0.9428	23440376
1.2637	0.9693	460	0.9425	23699824
1.1678	0.9798	465	0.9441	23960164
1.6635	0.9904	470	0.9436	24220104

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-27b_hs2_accumulate_iter6_sftsd0

collapse_gemma-2-27b_hs2_accumulate_iter6_sftsd0

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-27b_hs2_accumulate_iter6_sftsd0

Evaluation results