mms-300m-upper-sorbian
This is an automatic speech recognition (ASR) model for the Upper Sorbian language, a minority Slavic language spoken in Saxony, Germany. The model is a fine-tuned version of facebook/mms-300m and trained on the train split of Common Voice 17 dataset (Upper Sorbian - hsb).
It achieves the following results on the evaluation set (validation split):
- Loss: 0.6600
- Wer: 0.4203
- Cer: 0.0930
Model description
ASR model trained on crowdsourced speech from Mozilla Common Voice. It can be used to transcribe Upper Sorbian speech into text.
Intended uses & limitations
The model is intended to be used as a speech-to-text system. However, it has only been trained on scripted read speech thus it may not perform well on conversational speech.
Training and evaluation data
Mozilla Common Voice (Upper Sorbian - hsb)
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0003
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 100
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Wer | Cer |
---|---|---|---|---|---|
3.408 | 3.9216 | 100 | 3.3797 | 1.0 | 1.0 |
3.1402 | 7.8431 | 200 | 3.1629 | 1.0 | 1.0 |
0.7479 | 11.7647 | 300 | 1.0200 | 0.9323 | 0.2916 |
0.2111 | 15.6863 | 400 | 0.7733 | 0.7095 | 0.1844 |
0.1842 | 19.6078 | 500 | 0.7090 | 0.6051 | 0.1549 |
0.0618 | 23.5294 | 600 | 0.7410 | 0.6184 | 0.1474 |
0.0802 | 27.4510 | 700 | 0.7037 | 0.55 | 0.1308 |
0.0392 | 31.3725 | 800 | 0.7951 | 0.5924 | 0.1430 |
0.0504 | 35.2941 | 900 | 0.7686 | 0.5418 | 0.1290 |
0.0436 | 39.2157 | 1000 | 0.7336 | 0.55 | 0.1239 |
0.0282 | 43.1373 | 1100 | 0.7303 | 0.5133 | 0.1211 |
0.0333 | 47.0588 | 1200 | 0.6966 | 0.5057 | 0.1204 |
0.0243 | 50.9804 | 1300 | 0.6883 | 0.4734 | 0.1088 |
0.0218 | 54.9020 | 1400 | 0.7155 | 0.5051 | 0.1168 |
0.0219 | 58.8235 | 1500 | 0.6778 | 0.4943 | 0.1111 |
0.0101 | 62.7451 | 1600 | 0.6565 | 0.4570 | 0.1063 |
0.012 | 66.6667 | 1700 | 0.6723 | 0.4405 | 0.1016 |
0.0233 | 70.5882 | 1800 | 0.6700 | 0.4589 | 0.1039 |
0.0075 | 74.5098 | 1900 | 0.7376 | 0.4570 | 0.1062 |
0.0165 | 78.4314 | 2000 | 0.7359 | 0.4443 | 0.1010 |
0.0071 | 82.3529 | 2100 | 0.7349 | 0.4532 | 0.1022 |
0.0055 | 86.2745 | 2200 | 0.6797 | 0.4411 | 0.0991 |
0.0051 | 90.1961 | 2300 | 0.7313 | 0.4354 | 0.0975 |
0.0062 | 94.1176 | 2400 | 0.6847 | 0.4203 | 0.0938 |
0.0142 | 98.0392 | 2500 | 0.6600 | 0.4203 | 0.0930 |
Framework versions
- Transformers 4.42.0.dev0
- Pytorch 2.3.1+cu121
- Datasets 2.19.2
- Tokenizers 0.19.1
- Downloads last month
- 10
Model tree for badrex/mms-300m-upper-sorbian
Base model
facebook/mms-300mEvaluation results
- Wer on common_voice_17_0validation set self-reported0.420