reazon-research
/

reazonspeech-espnet-v2

Automatic Speech Recognition

Model card Files Files and versions Community

reazonspeech-espnet-v2

reazonspeech-espnet-v2 is an automatic speech recognition (ASR) model trained on ReazonSpeech v2.0 corpus.

Model Architecture

The general architecture is the same as reazonspeech-espnet-v1.

Conformer-Transducer model with 118.85M parameters.
We trained this model for 33 epoch using Adam optimizer. The maximum learning rate was 0.02, with 15000 warmup steps.
The training audio files were sampled at 16khz. Make sure that your input audio files have the same sampling rate.

Usage

We recommend to use this model through our reazonspeech library.

from reazonspeech.espnet.asr import load_model, transcribe, audio_from_path

audio = audio_from_path("speech.wav")
model = load_model()
ret = transcribe(model, audio)
print(ret.text)

License

Apaceh Licence 2.0

Downloads last month: 181

Inference Providers NEW

Automatic Speech Recognition

This model is not currently available via any of the supported Inference Providers.

Collection including reazon-research/reazonspeech-espnet-v2

ReazonSpeech ASR

Official releases of ReazonSpeech ASR models • 5 items • Updated Jan 20