Model
This model is Wav2Vec2-Large-XLSR-53 fine-tuned on the manually annotated subset of CMU's L2-Arctic dataset. It was fine-tuned to perform automatic phonetic transcriptions in IPA. It was tuned following a similar procedure as described by vitouphy with the TIMIT dataset.
Usage
To use the model, create a pipeline and invoke it with the path to your WAV, which must be sampled at 16KHz.
from transformers import pipeline
pipe = pipeline(model="mrrubino/wav2vec2-large-xlsr-53-l2-arctic-phoneme")
transcription = pipe("file.wav")["text"]
Results
The manually annotated subset of L2-Arctic was divided into training and testing datasets with a 90/10 split. The performance metrics for the testing dataset are included below.
WER - 0.425
CER - 0.128
- Downloads last month
- 1,819
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.