lucio
/

wav2vec2-large-xlsr-kinyarwanda

Automatic Speech Recognition

xlsr-fine-tuning-week

Inference Endpoints

Model card Files Files and versions Community

lucio commited on Apr 13, 2021

Commit

1906fc0

•

1 Parent(s): 3dd85cc

Update README.md

Files changed (1) hide show

README.md +4 -1

README.md CHANGED Viewed

@@ -68,8 +68,11 @@ print("Prediction:", processor.batch_decode(predicted_ids))
 print("Reference:", test_dataset["sentence"][:2])
 ```
 Prediction: ['yaherukaga gukora igitaramo y iki mu jyiwa na mul mumbiliki', 'ini rero ntibizashoboka ka nibo nkunrabibzi']
 Reference: ['Yaherukaga gukora igitaramo nk’iki mu Mujyi wa Namur mu Bubiligi.', 'Ibi rero, ntibizashoboka, kandi nawe arabizi.']
 ## Evaluation
@@ -154,6 +157,6 @@ print("WER: {:2f}".format(100 * chunked_wer(result["sentence"], result["pred_str
 ## Training
-Blocks of examples from the Common Voice training dataset (totaling about 100k examples, 20% of the available data) were used for training for 30k global steps, on 1 V100 GPU provided by OVHcloud. For validation, 2048 examples of the validation dataset were used.
 The [script used for training](https://github.com/serapio/transformers/blob/feature/xlsr-finetune/examples/research_projects/wav2vec2/run_common_voice.py) is adapted from the [example script provided in the transformers repo](https://github.com/huggingface/transformers/blob/master/examples/research_projects/wav2vec2/run_common_voice.py).

 print("Reference:", test_dataset["sentence"][:2])
 ```
+Result:
+```
 Prediction: ['yaherukaga gukora igitaramo y iki mu jyiwa na mul mumbiliki', 'ini rero ntibizashoboka ka nibo nkunrabibzi']
 Reference: ['Yaherukaga gukora igitaramo nk’iki mu Mujyi wa Namur mu Bubiligi.', 'Ibi rero, ntibizashoboka, kandi nawe arabizi.']
+```
 ## Evaluation
 ## Training
+Blocks of examples from the Common Voice training dataset were used for training, after filtering out utterances that had any `down_vote` or were longer than 9.5 seconds. The data used totals about 100k examples, 20% of the available data. Training proceeded for 30k global steps, on 1 V100 GPU provided by OVHcloud. For validation, 2048 examples of the validation dataset were used.
 The [script used for training](https://github.com/serapio/transformers/blob/feature/xlsr-finetune/examples/research_projects/wav2vec2/run_common_voice.py) is adapted from the [example script provided in the transformers repo](https://github.com/huggingface/transformers/blob/master/examples/research_projects/wav2vec2/run_common_voice.py).