Tunisian Arabic ASR Model with wav2vec2
This repository provides all the necessary tools to perform automatic speech recognition from an end-to-end system pretrained on Tunisian arabic dialect
Performance
the performance of the mode is :
Release Version | WER (%) | CER (%) | |
---|---|---|---|
v1.0 | Without LM | 11.82 | 6.33 |
Dataset
This ASR model was trained on :
- TARIC : The corpus, named TARIC (Tunisian Arabic Railway Interaction Corpus) has a collection of audio recordings and transcriptions from dialogues in the Tunisian Railway Transport Network. - Taric Corpus -
- STAC :A corpus of spoken Tunisian Arabic - STAC Corpus
- IWSLT : A Tunisian conversational speech - IWSLT Corpus-
- Tunspeech : Our custom dataset
Install
pip install speechbrain transformers