cwkeam
/

mctct-large

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

cwkeam commited on May 5, 2022

Commit

043ac42

·

1 Parent(s): 7ea717e

add model card

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -2,7 +2,7 @@
 Massively multilingual speech recognizer from Meta AI. The model is a 1B-param transformer encoder, with a CTC head over 8065 character labels and a language identification head over 60 language ID labels. It is trained on Common Voice (version 6.1, December 2020 release) and VoxPopuli. After training on Common Voice and VoxPopuli, the model is trained on Common Voice only. The labels are unnormalized character-level transcripts (punctuation and capitalization are not removed). The model takes as input Mel filterbank features from a 16Khz audio signal.
-![model image](https://github.com/cwkeam/scientific-images/blob/main/MCTCT/mctct-arch.png)
 The original Flashlight code, model checkpoints, and Colab notebook can be found at https://github.com/flashlight/wav2letter/tree/main/recipes/mling_pl .
@@ -26,7 +26,7 @@ Additional thanks to [Chan Woo Kim](https://huggingface.co/cwkeam) and [Patrick
 # Training method
-![model image](https://github.com/cwkeam/scientific-images/blob/main/MCTCT/mctct-slimipl.png) TO-DO: replace with the training diagram from paper
 For more information on how the model was trained, please take a look at the [official paper](https://arxiv.org/abs/2111.00161).

 Massively multilingual speech recognizer from Meta AI. The model is a 1B-param transformer encoder, with a CTC head over 8065 character labels and a language identification head over 60 language ID labels. It is trained on Common Voice (version 6.1, December 2020 release) and VoxPopuli. After training on Common Voice and VoxPopuli, the model is trained on Common Voice only. The labels are unnormalized character-level transcripts (punctuation and capitalization are not removed). The model takes as input Mel filterbank features from a 16Khz audio signal.
+![model image](https://raw.githubusercontent.com/cwkeam/scientific-images/main/MCTCT/mctct-arch.png)
 The original Flashlight code, model checkpoints, and Colab notebook can be found at https://github.com/flashlight/wav2letter/tree/main/recipes/mling_pl .
 # Training method
+![model image](https://raw.githubusercontent.com/cwkeam/scientific-images/main/MCTCT/mctct-slimipl.png) TO-DO: replace with the training diagram from paper
 For more information on how the model was trained, please take a look at the [official paper](https://arxiv.org/abs/2111.00161).