Fine-tuned XLSR-53-russian large model for speech recognition in Macedonian
Authors:
- Dejan Porjazovski
- Ilina Jakimovska
- Ordan Chukaliev
- Nikola Stikov
This collaboration is part of the activities of the Center for Advanced Interdisciplinary Research (CAIR) at UKIM.
Model description
This model is an attention-based encoder-decoder (AED). The encoder is a Wav2vec2 model and the decoder is RNN-based.
Data used for training
The model is trained on around 115 hours of Macedonian speech.
Results
The results are reported on all the test sets combined and without an external language model.
WER: 10.21
CER: 3.89
Usage
The model is developed using the SpeechBrain toolkit. To use it, you need to install SpeechBrain with:
pip install speechbrain
SpeechBrain relies on the Transformers library, therefore you need install it:
pip install transformers
An external py_module_file=custom_interface.py
is used as an external Predictor class into this HF repos. We use the foreign_class
function from speechbrain.pretrained.interfaces
that allows you to load your custom model.
from speechbrain.inference.interfaces import foreign_class
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
asr_classifier = foreign_class(source="Macedonian-ASR/buki-wav2vec2-2.0", pymodule_file="custom_interface_app.py", classname="ASR")
asr_classifier = asr_classifier.to(device)
predictions = asr_classifier.classify_file("audio_file.wav", device)
print(predictions)
Docker
A Docker image for the model with a Gradio web interface is available here: https://hub.docker.com/repository/docker/porjaz/buki-wav2vec2-2.0/general To run the container with GPU, first pull the image:
docker pull porjaz/buki-wav2vec2-2.0
Then, run the container:
docker run -v models:/docker_wav2vec2_gradio_app/models -it --gpus all -p 7860:7860 porjaz/buki-wav2vec2-2.0
Training
To fine-tune this model, you need to run:
python train.py hyperparams.yaml
train.py
file contains the functions necessary for training the model and hyperparams.yaml
contains the hyperparameters. For more details about training the model, refer to the SpeechBrain documentation.
- Downloads last month
- 299