|
--- |
|
language: |
|
- am |
|
license: mit |
|
tags: |
|
- automatic-speech-recognition |
|
- speech |
|
metrics: |
|
- wer |
|
- cer |
|
pipeline_tag: automatic-speech-recognition |
|
--- |
|
|
|
# Amharic ASR using fine-tuned Wav2vec2 XLSR-53 |
|
This is a finetuned version of [facebook/wav2vec2-large-xlsr-53](https://huggingface.co./facebook/wav2vec2-large-xlsr-53) trained on the [Amharic Speech Corpus](http://www.openslr.org/25/). This corpus was produced by [Abate et al. (2005)](https://www.isca-speech.org/archive/interspeech_2005/abate05_interspeech.html) (10.21437/Interspeech.2005-467). |
|
|
|
The model achieves a WER of 26% and a CER of 7% on the validation set of the Amharic Readspeech data. |
|
|
|
## Usage |
|
The model can be used as follows: |
|
```python |
|
import librosa |
|
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor |
|
|
|
model = Wav2Vec2ForCTC.from_pretrained("agkphysics/wav2vec2-large-xlsr-53-amharic") |
|
processor = Wav2Vec2Processor.from_pretrained("agkphysics/wav2vec2-large-xlsr-53-amharic") |
|
|
|
audio, _ = librosa.load("/path/to/audio.wav", sr=16000) |
|
|
|
input_values = processor( |
|
audio.squeeze(), |
|
sampling_rate=16000, |
|
return_tensors="pt" |
|
).input_values |
|
|
|
model.eval() |
|
with torch.no_grad(): |
|
logits = model(input_values).logits |
|
preds = logits.argmax(-1) |
|
texts = processor.batch_decode(preds) |
|
print(texts[0]) |
|
``` |
|
|
|
## Training |
|
The code to train this model is available at https://github.com/agkphysics/amharic-asr. |