ssid32
/

wav2vec2-xlsr-dendi-ddn-for-numerals

Automatic Speech Recognition

wav2vec2-fine-tuning

dendi-text-to-speech

Model card Files Files and versions Community

wav2vec2-xlsr-dendi-ddn-for-numerals / README.md

ssid32's picture

Update README.md

7721787 verified about 2 months ago

|

history blame contribute delete

3.57 kB

	---
	license: cc-by-nc-4.0
	language: ddn
	metrics:
	- wer
	tags:
	- text-to-audio
	- automatic-speech-recognition
	- wav2vec2-fine-tuning
	- dendi-text-to-speech
	model-index:
	- name: Dendi Numerals ASR
	results:
	- task:
	name: Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: dendi
	type: dendi_numbers_dataset
	metrics:
	- name: Test WER
	type: wer
	value: 18.18
	pipeline_tag: automatic-speech-recognition
	---

	# CreaTiv Team (CTT): Dendi Numerals Automatic Speech Recognition

	This repository contains an Automatic Speech Recognition (ASR) model specifically for recognizing numerals in the Dendi (ddn) language.
	The model can accurately recognize numbers ranging from 0 to 1,000,000,000 when spoken in Dendi.

	This model is part of Creativ Team's [Noulinmon](https://noulinmon.baruwuu.bj/) project, a user-friendly mobile app designed to make calculations accessible in six local languages of Benin, featuring voice reading and AI capabilities.
	You can find more CTT-ASR models on the Hugging Face Hub: [ssid32/ctt-asr](https://huggingface.co./models?sort=trending&search=ssid32).

	CTT-ASR is available in the 🤗 Transformers library from version 4.4 onwards.

	## Model Details

	The model is a fine-tuned version of [facebook/wav2vec2-large-xlsr-53](https://huggingface.co./facebook/wav2vec2-large-xlsr-53) on Dendi.
	When using this model, make sure that your speech input is sampled at 16kHz.


	## Usage

	To use this model, first install the latest version of 🤗 Transformers library:

	```
	pip install --upgrade transformers accelerate
	```

	Then, run inference with the following code-snippet:

	```python
	import torch
	import torchaudio
	from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor

	processor = Wav2Vec2Processor.from_pretrained("ssid32/wav2vec2-xlsr-dendi-ddn-for-numerals")
	model = Wav2Vec2ForCTC.from_pretrained("ssid32/wav2vec2-xlsr-dendi-ddn-for-numerals")

	speech_array, sampling_rate = torchaudio.load("audio_test.wav")
	speech_array = speech_array.squeeze().numpy()
	inputs = processor(speech_array, sampling_rate=16_000, return_tensors="pt", padding=True)

	with torch.no_grad():
	logits = model(inputs.input_values, attention_mask=inputs.attention_mask).logits
	output = processor.batch_decode(torch.argmax(logits, dim=-1))

	print("Output:", output)

	```



	You can listen to the sample audio here:

	<audio controls>
	<source src="https://huggingface.co./ssid32/wav2vec2-xlsr-dendi-ddn-for-numerals/resolve/main/audio_test.wav" type="audio/wav">
	Your browser does not support the audio element.
	</audio>

	Upon processing the sample audio, the model produces the following output:

	```
	Output: ['zangu ihaaku nda weiguu']
	```

	In this case, the output represents the numeral 850 in the Dendi language.

	### Evaluation result

	The model's performance on a test set yields a Word Error Rate (WER) of 18.18%.

	## Authors

	This model was developed by:
	- Salim KORA GUERA (HuggingFace Username: [ssid32](https://huggingface.co./ssid32)) \| ([email protected])
	- Etienne TOVIMAFA (HuggingFace Username: [MrBendji](https://huggingface.co./MrBendji)) \| ([email protected])

	## Citation

	```bibtex
	@misc {
	author = { {Salim KORA GUERA and Etienne TOVIMAFA} },
	title = { wav2vec2-xlsr-dendi-ddn-for-numerals },
	year = 2024,
	url = { https://huggingface.co./ssid32/wav2vec2-xlsr-dendi-ddn-for-numerals },
	doi = { 10.57967/hf/2930 },
	publisher = { Hugging Face }
	}
	```

	## License

	The model is licensed as CC-BY-NC 4.0.