metadata

license: cc-by-nc-4.0
base_model: facebook/mms-300m
language: hsb
tags:
  - automatic-speech-recognition
  - Upper-Sorbian
  - pytorch
  - transformers
  - MMS
datasets:
  - common_voice_17_0
metrics:
  - wer
model-index:
  - name: mms-300m-upper-sorbian
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: common_voice_17_0
          type: common_voice_17_0
          config: hsb
          split: validation
          args: hsb
        metrics:
          - name: Wer
            type: wer
            value: 0.42025316455696204
pipeline_tag: automatic-speech-recognition
library_name: transformers

mms-300m-upper-sorbian

This is an automatic speech recognition (ASR) model for the Upper Sorbian language, a minority Slavic language spoken in Saxony, Germany. The model is a fine-tuned version of facebook/mms-300m and trained on the train split of Common Voice 17 dataset (Upper Sorbian - hsb).

It achieves the following results on the evaluation set (validation split):

Loss: 0.6600
Wer: 0.4203
Cer: 0.0930

Model description

ASR model trained on crowdsourced speech from Mozilla Common Voice. It can be used to transcribe Upper Sorbian speech into text.

Intended uses & limitations

The model is intended to be used as a speech-to-text system. However, it has only been trained on scripted read speech thus it may not perform well on conversational speech.

Training and evaluation data

Mozilla Common Voice (Upper Sorbian - hsb)

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 16
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 100
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer	Cer
3.408	3.9216	100	3.3797	1.0	1.0
3.1402	7.8431	200	3.1629	1.0	1.0
0.7479	11.7647	300	1.0200	0.9323	0.2916
0.2111	15.6863	400	0.7733	0.7095	0.1844
0.1842	19.6078	500	0.7090	0.6051	0.1549
0.0618	23.5294	600	0.7410	0.6184	0.1474
0.0802	27.4510	700	0.7037	0.55	0.1308
0.0392	31.3725	800	0.7951	0.5924	0.1430
0.0504	35.2941	900	0.7686	0.5418	0.1290
0.0436	39.2157	1000	0.7336	0.55	0.1239
0.0282	43.1373	1100	0.7303	0.5133	0.1211
0.0333	47.0588	1200	0.6966	0.5057	0.1204
0.0243	50.9804	1300	0.6883	0.4734	0.1088
0.0218	54.9020	1400	0.7155	0.5051	0.1168
0.0219	58.8235	1500	0.6778	0.4943	0.1111
0.0101	62.7451	1600	0.6565	0.4570	0.1063
0.012	66.6667	1700	0.6723	0.4405	0.1016
0.0233	70.5882	1800	0.6700	0.4589	0.1039
0.0075	74.5098	1900	0.7376	0.4570	0.1062
0.0165	78.4314	2000	0.7359	0.4443	0.1010
0.0071	82.3529	2100	0.7349	0.4532	0.1022
0.0055	86.2745	2200	0.6797	0.4411	0.0991
0.0051	90.1961	2300	0.7313	0.4354	0.0975
0.0062	94.1176	2400	0.6847	0.4203	0.0938
0.0142	98.0392	2500	0.6600	0.4203	0.0930

Framework versions

Transformers 4.42.0.dev0
Pytorch 2.3.1+cu121
Datasets 2.19.2
Tokenizers 0.19.1