mms-1b-bigcgen-baseline-model

This model is a fine-tuned version of facebook/mms-1b-all on the BIGCGEN - BEM dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
num_epochs: 30.0
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Wer
12.3367	0.3058	100	1.3241	0.8893
1.8764	0.6116	200	0.6894	0.5815
1.6712	0.9174	300	0.6390	0.5514
1.5044	1.2232	400	0.6301	0.5351
1.6648	1.5291	500	0.6076	0.5283
1.6411	1.8349	600	0.6073	0.5283
1.4016	2.1407	700	0.5994	0.5124
1.5703	2.4465	800	0.5997	0.5162
1.4165	2.7523	900	0.5850	0.5084
1.4703	3.0581	1000	0.5912	0.5127
1.48	3.3639	1100	0.5707	0.4999
1.4769	3.6697	1200	0.5675	0.4949
1.312	3.9755	1300	0.5856	0.4980
1.3821	4.2813	1400	0.5642	0.4992
1.457	4.5872	1500	0.5588	0.5053
1.3606	4.8930	1600	0.5637	0.4866
1.3986	5.1988	1700	0.5511	0.4866
1.421	5.5046	1800	0.5846	0.5346
1.3004	5.8104	1900	0.5440	0.4736
1.3319	6.1162	2000	0.5318	0.4786
1.2665	6.4220	2100	0.5488	0.5065
1.3703	6.7278	2200	0.5304	0.4878
1.1954	7.0336	2300	0.5298	0.4807
1.2973	7.3394	2400	0.5258	0.4706
1.2086	7.6453	2500	0.5231	0.4807
1.2796	7.9511	2600	0.5404	0.4739
1.1428	8.2569	2700	0.5328	0.4831
1.3118	8.5627	2800	0.5198	0.4769
1.2569	8.8685	2900	0.5306	0.4847
1.1718	9.1743	3000	0.5160	0.4649
1.1354	9.4801	3100	0.5265	0.4777
1.2795	9.7859	3200	0.5090	0.4590
1.1793	10.0917	3300	0.5265	0.4684
1.1647	10.3976	3400	0.5385	0.4762
1.1978	10.7034	3500	0.5132	0.4715
1.1802	11.0092	3600	0.5130	0.4597