Edit model card

XLS-R-based CTC model with 5-gram language model from Open Subtitles

This model is a version of facebook/wav2vec2-xls-r-2b-22-to-16 fine-tuned mainly on the CGN dataset, as well as the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - NL dataset (see details below), on which a large 5-gram language model is added based on the Open Subtitles Dutch corpus. This model achieves the following results on the evaluation set (of Common Voice 8.0):

  • Wer: 0.04057
  • Cer: 0.01222

Model description

The model takes 16kHz sound input, and uses a Wav2Vec2ForCTC decoder with 48 letters to output the letter-transcription probabilities per frame.

To improve accuracy, a beam-search decoder based on pyctcdecode is then used; it reranks the most promising alignments based on a 5-gram language model trained on the Open Subtitles Dutch corpus.

Intended uses & limitations

This model can be used to transcribe Dutch or Flemish spoken dutch to text (without punctuation).

Training and evaluation data

The model was:

  1. initialized with the 2B parameter model from Facebook.
  2. trained 5 epochs (6000 iterations of batch size 32) on the cv8/nl dataset.
  3. trained 1 epoch (36000 iterations of batch size 32) on the cgn dataset.
  4. trained 5 epochs (6000 iterations of batch size 32) on the cv8/nl dataset.

Framework versions

  • Transformers 4.16.0
  • Pytorch 1.10.2+cu102
  • Datasets 1.18.3
  • Tokenizers 0.11.0
Downloads last month
22
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train FremyCompany/xls-r-2b-nl-v2_lm-5gram-os

Evaluation results