agkphysics
/

wav2vec2-large-xlsr-53-amharic

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

wav2vec2-large-xlsr-53-amharic / README.md

agkphysics's picture

Update README.md

8777589 about 1 year ago

|

history blame contribute delete

1.39 kB

	---
	language:
	- am
	license: mit
	tags:
	- automatic-speech-recognition
	- speech
	metrics:
	- wer
	- cer
	pipeline_tag: automatic-speech-recognition
	---

	# Amharic ASR using fine-tuned Wav2vec2 XLSR-53
	This is a finetuned version of [facebook/wav2vec2-large-xlsr-53](https://huggingface.co./facebook/wav2vec2-large-xlsr-53) trained on the [Amharic Speech Corpus](http://www.openslr.org/25/). This corpus was produced by [Abate et al. (2005)](https://www.isca-speech.org/archive/interspeech_2005/abate05_interspeech.html) (10.21437/Interspeech.2005-467).

	The model achieves a WER of 26% and a CER of 7% on the validation set of the Amharic Readspeech data.

	## Usage
	The model can be used as follows:
	```python
	import librosa
	from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor

	model = Wav2Vec2ForCTC.from_pretrained("agkphysics/wav2vec2-large-xlsr-53-amharic")
	processor = Wav2Vec2Processor.from_pretrained("agkphysics/wav2vec2-large-xlsr-53-amharic")

	audio, _ = librosa.load("/path/to/audio.wav", sr=16000)

	input_values = processor(
	audio.squeeze(),
	sampling_rate=16000,
	return_tensors="pt"
	).input_values

	model.eval()
	with torch.no_grad():
	logits = model(input_values).logits
	preds = logits.argmax(-1)
	texts = processor.batch_decode(preds)
	print(texts[0])
	```

	## Training
	The code to train this model is available at https://github.com/agkphysics/amharic-asr.