Fine-tuned whisper-large-v3 model for speech recognition in Macedonian

Authors:

Dejan Porjazovski
Ilina Jakimovska
Ordan Chukaliev
Nikola Stikov

This collaboration is part of the activities of the Center for Advanced Interdisciplinary Research (CAIR) at UKIM.

Data used for training

The model is trained on around 115 hours of Macedonian speech.

Model description

This model is a fine-tuned version of the large Whisper-v3 model. During fine-tuning, the encoder was kept frozen and only the decoder was optimized. The model was trained with data containing capitalisation and punctuation. While the model produces transcripts with proper capitalisation and punctuation, its performace is worse than Macedonian-ASR/buki-whisper-2.0

Results

The results are reported on all the test sets combined.

WER: 18.69
CER: 22.91

Usage

The model is developed using the SpeechBrain toolkit. To use it, you need to install SpeechBrain with:

pip install speechbrain

SpeechBrain relies on the Transformers library, therefore you need install it:

pip install transformers

An external py_module_file=custom_interface.py is used as an external Predictor class into this HF repos. We use the foreign_class function from speechbrain.pretrained.interfaces that allows you to load your custom model.

from speechbrain.inference.interfaces import foreign_class
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
asr_classifier = foreign_class(source="Macedonian-ASR/buki-whisper-capitalised-2.0", pymodule_file="custom_interface.py", classname="ASR")
asr_classifier = asr_classifier.to(device)
predictions = asr_classifier.classify_file("audio_file.wav", device)
print(predictions)

Training

To fine-tune this model, you need to run:

python train.py hyperparams.yaml

train.py file contains the functions necessary for training the model and hyperparams.yaml contains the hyperparameters. For more details about training the model, refer to the SpeechBrain documentation.

Macedonian-ASR
/

buki-whisper-capitalised-2.0