|
# Automatic Speech Recognition |
|
|
|
This directory contains example scripts to train ASR models using various methods such as Connectionist Temporal Classification loss, RNN Transducer Loss. |
|
|
|
Speech pre-training via self supervised learning, voice activity detection and other sub-domains are also included as part of this domain's examples. |
|
|
|
# ASR Model inference execution overview |
|
|
|
The inference scripts in this directory execute in the following order. When preparing your own inference scripts, please follow this order for correct inference. |
|
|
|
```mermaid |
|
|
|
graph TD |
|
A[Hydra Overrides + Config Dataclass] --> B{Config} |
|
B --> |Init| C[Model] |
|
B --> |Init| D[Trainer] |
|
C & D --> E[Set trainer] |
|
E --> |Optional| F[Change Transducer Decoding Strategy] |
|
F --> H[Load Manifest] |
|
E --> |Skip| H |
|
H --> I["model.transcribe(...)"] |
|
I --> J[Write output manifest] |
|
K[Ground Truth Manifest] |
|
J & K --> |Optional| L[Evaluate CER/WER] |
|
|
|
``` |
|
|
|
During restoration of the model, you may pass the Trainer to the restore_from / from_pretrained call, or set it after the model has been initialized by using `model.set_trainer(Trainer)`. |