collapse_gemma-2-27b_hs2_accumulate_iter5_sftsd2

This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.9401
Num Input Tokens Seen: 21285160

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 4
eval_batch_size: 16
seed: 2
gradient_accumulation_steps: 32
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.1282	0
3.2221	0.0117	5	1.0926	251756
2.8365	0.0234	10	1.0155	507320
2.7913	0.0350	15	0.9995	754196
2.5411	0.0467	20	0.9870	998220
2.3748	0.0584	25	0.9889	1249252
2.5183	0.0701	30	0.9940	1494160
2.2618	0.0818	35	0.9941	1742232
2.2768	0.0935	40	0.9943	1984428
1.9749	0.1051	45	0.9904	2230660
1.8335	0.1168	50	0.9954	2480560
2.0343	0.1285	55	0.9924	2738660
1.9019	0.1402	60	0.9870	2984676
1.7913	0.1519	65	0.9850	3232276
1.7679	0.1635	70	0.9835	3481768
1.5726	0.1752	75	0.9809	3725896
1.3122	0.1869	80	0.9820	3976404
1.2818	0.1986	85	0.9803	4222020
1.2534	0.2103	90	0.9755	4467096
1.3957	0.2219	95	0.9715	4712856
1.4468	0.2336	100	0.9735	4966776
1.2346	0.2453	105	0.9688	5219188
1.375	0.2570	110	0.9661	5470400
1.2864	0.2687	115	0.9675	5718348
1.2863	0.2804	120	0.9653	5962556
1.2904	0.2920	125	0.9654	6212032
1.292	0.3037	130	0.9624	6457492
1.2084	0.3154	135	0.9630	6706428
1.2862	0.3271	140	0.9621	6958064
1.2497	0.3388	145	0.9612	7208000
1.0042	0.3504	150	0.9585	7455840
1.2159	0.3621	155	0.9577	7709904
1.2636	0.3738	160	0.9569	7958904
1.1413	0.3855	165	0.9598	8201696
1.232	0.3972	170	0.9544	8459644
1.2286	0.4088	175	0.9553	8707480
1.2674	0.4205	180	0.9535	8955328
1.203	0.4322	185	0.9543	9198592
1.1438	0.4439	190	0.9509	9453684
1.3743	0.4556	195	0.9539	9703056
1.4924	0.4673	200	0.9497	9951200
1.2615	0.4789	205	0.9529	10190064
1.1522	0.4906	210	0.9509	10442280
1.1088	0.5023	215	0.9542	10691252
1.1145	0.5140	220	0.9497	10932600
1.1479	0.5257	225	0.9498	11177348
1.1476	0.5373	230	0.9497	11426060
1.3338	0.5490	235	0.9495	11679624
1.1771	0.5607	240	0.9504	11928780
0.9654	0.5724	245	0.9466	12180408
1.1334	0.5841	250	0.9489	12427700
1.1846	0.5958	255	0.9483	12685832
1.2411	0.6074	260	0.9477	12932524
1.2086	0.6191	265	0.9479	13179472
1.1832	0.6308	270	0.9467	13433160
1.1775	0.6425	275	0.9466	13682108
1.2339	0.6542	280	0.9456	13931324
1.2441	0.6658	285	0.9469	14185032
1.0774	0.6775	290	0.9439	14433164
1.1275	0.6892	295	0.9443	14674964
1.0283	0.7009	300	0.9427	14922084
1.0613	0.7126	305	0.9453	15175408
0.8593	0.7242	310	0.9424	15427908
1.1358	0.7359	315	0.9448	15675160
1.1232	0.7476	320	0.9434	15927548
1.1183	0.7593	325	0.9437	16174772
1.1012	0.7710	330	0.9442	16433432
1.1579	0.7827	335	0.9423	16680244
0.8979	0.7943	340	0.9453	16924192
1.1912	0.8060	345	0.9409	17168560
1.0824	0.8177	350	0.9446	17422176
1.1499	0.8294	355	0.9414	17672716
0.8825	0.8411	360	0.9413	17908784
1.0893	0.8527	365	0.9433	18156560
0.9911	0.8644	370	0.9458	18401204
1.0546	0.8761	375	0.9444	18658068
1.0192	0.8878	380	0.9412	18899828
0.9538	0.8995	385	0.9448	19146148
1.2351	0.9111	390	0.9429	19398392
1.1466	0.9228	395	0.9434	19649472
0.886	0.9345	400	0.9437	19896600
0.95	0.9462	405	0.9446	20149368
1.0627	0.9579	410	0.9423	20397044
1.0381	0.9696	415	0.9434	20643720
1.035	0.9812	420	0.9432	20895920
0.8919	0.9929	425	0.9423	21138916

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-27b_hs2_accumulate_iter5_sftsd2

collapse_gemma-2-27b_hs2_accumulate_iter5_sftsd2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-27b_hs2_accumulate_iter5_sftsd2

Evaluation results