medusa-microllama_305M_stage1_v2

This model is a fine-tuned version of keeeeenw/MicroLlama on the None dataset. It achieves the following results on the evaluation set:

Loss: 2.5107

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 40
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss
3.0312	0.0244	40	3.0649
3.026	0.0489	80	2.9528
2.8781	0.0733	120	2.9163
2.8075	0.0978	160	2.9268
2.9164	0.1222	200	2.9027
2.7724	0.1467	240	2.8815
2.8856	0.1711	280	2.8871
2.718	0.1955	320	2.8749
2.6479	0.2200	360	2.8815
2.6194	0.2444	400	2.8872
2.7954	0.2689	440	2.8773
2.7008	0.2933	480	2.8572
2.6876	0.3178	520	2.8560
2.879	0.3422	560	2.8665
2.7377	0.3666	600	2.8482
2.7459	0.3911	640	2.8512
2.8036	0.4155	680	2.8712
2.89	0.4400	720	2.8614
2.7898	0.4644	760	2.8570
2.891	0.4888	800	2.8384
2.717	0.5133	840	2.8344
2.8589	0.5377	880	2.8342
2.8944	0.5622	920	2.8040
2.85	0.5866	960	2.8012
2.8057	0.6111	1000	2.8063
2.6772	0.6355	1040	2.7957
2.7905	0.6599	1080	2.7822
2.7579	0.6844	1120	2.7922
2.7625	0.7088	1160	2.7763
2.85	0.7333	1200	2.7607
2.8447	0.7577	1240	2.7611
2.8027	0.7822	1280	2.7501
2.461	0.8066	1320	2.7201
2.6232	0.8310	1360	2.6906
2.6998	0.8555	1400	2.6763
2.7609	0.8799	1440	2.6603
2.6003	0.9044	1480	2.6549
2.2626	0.9288	1520	2.6484
2.5896	0.9533	1560	2.6389
2.5704	0.9777	1600	2.6245
2.1629	1.0021	1640	2.6164
2.1719	1.0266	1680	2.6152
2.2115	1.0510	1720	2.6134
2.359	1.0755	1760	2.6127
2.3486	1.0999	1800	2.6066
2.1864	1.1244	1840	2.6041
2.1692	1.1488	1880	2.6023
2.1455	1.1732	1920	2.5998
2.195	1.1977	1960	2.5914
2.3458	1.2221	2000	2.5883
2.1419	1.2466	2040	2.5827
2.1329	1.2710	2080	2.5743
2.2733	1.2954	2120	2.5686
2.2662	1.3199	2160	2.5654
2.399	1.3443	2200	2.5637
2.1518	1.3688	2240	2.5563
2.1115	1.3932	2280	2.5483
2.2048	1.4177	2320	2.5434
2.2658	1.4421	2360	2.5390
2.2186	1.4665	2400	2.5366
2.1467	1.4910	2440	2.5321
2.2352	1.5154	2480	2.5281
2.2507	1.5399	2520	2.5250
2.1987	1.5643	2560	2.5221
2.2234	1.5888	2600	2.5205
2.0497	1.6132	2640	2.5181
2.1133	1.6376	2680	2.5166
2.1047	1.6621	2720	2.5153
2.1578	1.6865	2760	2.5148
2.1869	1.7110	2800	2.5135
2.0953	1.7354	2840	2.5126
2.1413	1.7599	2880	2.5119
2.1333	1.7843	2920	2.5115
2.2001	1.8087	2960	2.5114
2.1889	1.8332	3000	2.5111
2.2247	1.8576	3040	2.5110
2.2258	1.8821	3080	2.5108
2.157	1.9065	3120	2.5107
2.181	1.9310	3160	2.5107
2.1441	1.9554	3200	2.5107
2.4097	1.9798	3240	2.5107

Framework versions

Transformers 4.43.0
Pytorch 2.3.1
Datasets 2.15.0
Tokenizers 0.19.1

Momorami
/

medusa-microllama_305m_stage1

medusa-microllama_305M_stage1_v2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for Momorami/medusa-microllama_305m_stage1

Dataset used to train Momorami/medusa-microllama_305m_stage1

Evaluation results