Tunisian Arabic ASR Model with wav2vec2

This repository provides all the necessary tools to perform automatic speech recognition from an end-to-end system pretrained on Tunisian arabic dialect

Performance

the performance of the mode is :

Release Version		WER (%)	CER (%)
v1.0	Without LM	11.82	6.33

Dataset

This ASR model was trained on :

TARIC : The corpus, named TARIC (Tunisian Arabic Railway Interaction Corpus) has a collection of audio recordings and transcriptions from dialogues in the Tunisian Railway Transport Network. - Taric Corpus -
STAC :A corpus of spoken Tunisian Arabic - STAC Corpus
IWSLT : A Tunisian conversational speech - IWSLT Corpus-
Tunspeech : Our custom dataset

Install

pip install speechbrain transformers