File size: 1,150 Bytes
7934b29 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
# Automatic Speech Recognition
This directory contains example scripts to train ASR models using various methods such as Connectionist Temporal Classification loss, RNN Transducer Loss.
Speech pre-training via self supervised learning, voice activity detection and other sub-domains are also included as part of this domain's examples.
# ASR Model inference execution overview
The inference scripts in this directory execute in the following order. When preparing your own inference scripts, please follow this order for correct inference.
```mermaid
graph TD
A[Hydra Overrides + Config Dataclass] --> B{Config}
B --> |Init| C[Model]
B --> |Init| D[Trainer]
C & D --> E[Set trainer]
E --> |Optional| F[Change Transducer Decoding Strategy]
F --> H[Load Manifest]
E --> |Skip| H
H --> I["model.transcribe(...)"]
I --> J[Write output manifest]
K[Ground Truth Manifest]
J & K --> |Optional| L[Evaluate CER/WER]
```
During restoration of the model, you may pass the Trainer to the restore_from / from_pretrained call, or set it after the model has been initialized by using `model.set_trainer(Trainer)`. |