reazonspeech-nemo-v2
reazonspeech-nemo-v2
is an automatic speech recognition model trained
on ReazonSpeech v2.0 corpus.
This model supports inference of long-form Japanese audio clips up to several hours.
Model Architecture
The model features an improved Conformer architecture from Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition.
Subword-based RNN-T model. The total parameter count is 619M.
Encoder uses Longformer attention with local context size of 256, and has a single global token.
Decoder has a vocabulary space of 3000 tokens constructed by SentencePiece unigram tokenizer.
We trained this model for 1 million steps using AdamW optimizer following Noam annealing schedule.
Usage
We recommend to use this model through our reazonspeech library.
from reazonspeech.nemo.asr import load_model, transcribe, audio_from_path
audio = audio_from_path("speech.wav")
model = load_model()
ret = transcribe(model, audio)
print(ret.text)
License
- Downloads last month
- 28,960
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Collection including reazon-research/reazonspeech-nemo-v2
Collection
Official releases of ReazonSpeech ASR models
•
5 items
•
Updated