NeMo

File size: 1,150 Bytes

7934b29

# Automatic Speech Recognition

This directory contains example scripts to train ASR models using various methods such as Connectionist Temporal Classification loss, RNN Transducer Loss.

Speech pre-training via self supervised learning, voice activity detection and other sub-domains are also included as part of this domain's examples.

# ASR Model inference execution overview

The inference scripts in this directory execute in the following order. When preparing your own inference scripts, please follow this order for correct inference.

```mermaid

graph TD
    A[Hydra Overrides + Config Dataclass] --> B{Config}
    B --> |Init| C[Model]
    B --> |Init| D[Trainer]
    C & D --> E[Set trainer]
    E --> |Optional| F[Change Transducer Decoding Strategy]
    F --> H[Load Manifest]
    E --> |Skip| H
    H --> I["model.transcribe(...)"]
    I --> J[Write output manifest]
    K[Ground Truth Manifest]
    J & K --> |Optional| L[Evaluate CER/WER]

```

During restoration of the model, you may pass the Trainer to the restore_from / from_pretrained call, or set it after the model has been initialized by using `model.set_trainer(Trainer)`.