arxiv:2107.11414

Brazilian Portuguese Speech Recognition Using Wav2vec 2.0

Published on Jul 23, 2021

Authors:

Abstract

Deep learning techniques have been shown to be efficient in various tasks, especially in the development of speech recognition systems, that is, systems that aim to transcribe an audio sentence in a sequence of written words. Despite the progress in the area, speech recognition can still be considered difficult, especially for languages lacking available data, such as Brazilian Portuguese (BP). In this sense, this work presents the development of an public Automatic Speech Recognition (ASR) system using only open available audio data, from the fine-tuning of the Wav2vec 2.0 XLSR-53 model pre-trained in many languages, over BP data. The final model presents an average word error rate of 12.4% over 7 different datasets (10.5% when applying a language model). According to our knowledge, the obtained error is the lowest among open end-to-end (E2E) ASR models for BP.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2107.11414 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2107.11414 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.