getting a tokenizer related error

#1
by Narenameme - opened

Code i'm running:

import torch
import nemo.collections.asr as nemo_asr

model = nemo_asr.models.ASRModel.restore_from(restore_path="indicconformer_stt_ta_hybrid_rnnt_large.nemo")
print("Model loaded")

Error i'm getting:

[NeMo E 2025-01-22 16:33:25 nemo_logging:417] Model instantiation failed!
Target class: nemo.collections.asr.models.hybrid_rnnt_ctc_bpe_models.EncDecHybridRNNTCTCBPEModel
Error(s): 'dir'
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/nemo/core/classes/common.py", line 508, in from_config_dict
instance = imported_cls(cfg=config, trainer=trainer)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/nemo/collections/asr/models/hybrid_rnnt_ctc_bpe_models.py", line 56, in init
self._setup_tokenizer(cfg.tokenizer)
File "/usr/local/lib/python3.11/dist-packages/nemo/collections/asr/parts/mixins/mixins.py", line 65, in _setup_tokenizer
self._setup_monolingual_tokenizer(tokenizer_cfg)
File "/usr/local/lib/python3.11/dist-packages/nemo/collections/asr/parts/mixins/mixins.py", line 75, in _setup_monolingual_tokenizer
self.tokenizer_dir = self.tokenizer_cfg.pop('dir') # Remove tokenizer directory
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'dir'

Environment:

Google collab
Python 3.11.11
nemo_toolkit 2.1.0
pytorch 2.5.1+cu121

Possible solution:
https://colab.research.google.com/drive/1yG_WBXFQV3l11vjpM8fOsrhq6ojtpwKR?usp=sharing#scrollTo=9OWPnoFOwOZR

religiously following the above notebook helped.

Sign up or log in to comment