reazonspeech-espnet-v2
reazonspeech-espnet-v2
is an automatic speech recognition (ASR) model
trained on ReazonSpeech v2.0 corpus.
Model Architecture
The general architecture is the same as reazonspeech-espnet-v1.
Conformer-Transducer model with 118.85M parameters.
We trained this model for 33 epoch using Adam optimizer. The maximum learning rate was 0.02, with 15000 warmup steps.
The training audio files were sampled at 16khz. Make sure that your input audio files have the same sampling rate.
Usage
We recommend to use this model through our reazonspeech library.
from reazonspeech.espnet.asr import load_model, transcribe, audio_from_path
audio = audio_from_path("speech.wav")
model = load_model()
ret = transcribe(model, audio)
print(ret.text)
License
- Downloads last month
- 128
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.