NeMo / examples /asr /README.md
camenduru's picture
thanks to NVIDIA ❤
7934b29
# Automatic Speech Recognition
This directory contains example scripts to train ASR models using various methods such as Connectionist Temporal Classification loss, RNN Transducer Loss.
Speech pre-training via self supervised learning, voice activity detection and other sub-domains are also included as part of this domain's examples.
# ASR Model inference execution overview
The inference scripts in this directory execute in the following order. When preparing your own inference scripts, please follow this order for correct inference.
```mermaid
graph TD
A[Hydra Overrides + Config Dataclass] --> B{Config}
B --> |Init| C[Model]
B --> |Init| D[Trainer]
C & D --> E[Set trainer]
E --> |Optional| F[Change Transducer Decoding Strategy]
F --> H[Load Manifest]
E --> |Skip| H
H --> I["model.transcribe(...)"]
I --> J[Write output manifest]
K[Ground Truth Manifest]
J & K --> |Optional| L[Evaluate CER/WER]
```
During restoration of the model, you may pass the Trainer to the restore_from / from_pretrained call, or set it after the model has been initialized by using `model.set_trainer(Trainer)`.