---
license: apache-2.0
language:
- mk
library_name: speechbrain
pipeline_tag: automatic-speech-recognition
base_model:
- jonatasgrosman/wav2vec2-large-xlsr-53-russian
---

# Fine-tuned XLSR-53-russian large model for speech recognition in Macedonian

Authors:
1. Dejan Porjazovski
2. Ilina Jakimovska
3. Ordan Chukaliev
4. Nikola Stikov

This collaboration is part of the activities of the Center for Advanced Interdisciplinary Research (CAIR) at UKIM.

## Model description

This model is an attention-based encoder-decoder (AED). The encoder is a Wav2vec2 model and the decoder is RNN-based.


## Data used for training

The model is trained on around 115 hours of Macedonian speech.


## Results

The results are reported on all the test sets combined and without an external language model.

WER: 10.21 \
CER: 3.89


## Usage

The model is developed using the [SpeechBrain](https://speechbrain.github.io) toolkit. To use it, you need to install SpeechBrain with:
```
pip install speechbrain
```
SpeechBrain relies on the Transformers library, therefore you need install it:
```
pip install transformers
```

An external `py_module_file=custom_interface.py` is used as an external Predictor class into this HF repos. We use the `foreign_class` function from `speechbrain.pretrained.interfaces` that allows you to load your custom model. 

```python
from speechbrain.inference.interfaces import foreign_class
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
asr_classifier = foreign_class(source="Macedonian-ASR/buki-wav2vec2-2.0", pymodule_file="custom_interface_app.py", classname="ASR")
asr_classifier = asr_classifier.to(device)
predictions = asr_classifier.classify_file("audio_file.wav", device)
print(predictions)
```

## Docker

A Docker image for the model with a Gradio web interface is available here: https://hub.docker.com/repository/docker/porjaz/buki-wav2vec2-2.0/general
To run the container with GPU, first pull the image:
```
docker pull porjaz/buki-wav2vec2-2.0
```

Then, run the container:
```
docker run -v models:/docker_wav2vec2_gradio_app/models -it --gpus all -p 7860:7860 porjaz/buki-wav2vec2-2.0
```


## Training

To fine-tune this model, you need to run:
```
python train.py hyperparams.yaml
```

```train.py``` file contains the functions necessary for training the model and ```hyperparams.yaml``` contains the hyperparameters. For more details about training the model, refer to the [SpeechBrain](https://speechbrain.github.io) documentation.