Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 83 column 3

#25
by h3clikejava - opened

/home/h3c/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py:1150: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
warnings.warn(
2025-01-07 15:32:03.162136: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-01-07 15:32:03.173274: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-01-07 15:32:03.176658: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-01-07 15:32:03.185350: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-01-07 15:32:03.787266: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
/home/h3c/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py:1150: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
warnings.warn(
/home/h3c/.local/lib/python3.10/site-packages/flash_attn/ops/triton/layer_norm.py:959: FutureWarning: torch.cuda.amp.custom_fwd(args...) is deprecated. Please use torch.amp.custom_fwd(args..., device_type='cuda') instead.
def forward(
/home/h3c/.local/lib/python3.10/site-packages/flash_attn/ops/triton/layer_norm.py:1018: FutureWarning: torch.cuda.amp.custom_bwd(args...) is deprecated. Please use torch.amp.custom_bwd(args..., device_type='cuda') instead.
def backward(ctx, dout, *args):
Traceback (most recent call last):
File "/home/h3c/Desktop/a.py", line 27, in
text_embeddings = model.encode_text(sentences, truncate_dim=truncate_dim)
File "/home/h3c/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/h3c/.cache/huggingface/modules/transformers_modules/jinaai/jina-clip-implementation/51f02de9f2cf8afcd3bac4ce996859ba96f9f8e9/modeling_clip.py", line 565, in encode_text
self.tokenizer = self.get_tokenizer()
File "/home/h3c/.cache/huggingface/modules/transformers_modules/jinaai/jina-clip-implementation/51f02de9f2cf8afcd3bac4ce996859ba96f9f8e9/modeling_clip.py", line 333, in get_tokenizer
self.tokenizer = AutoTokenizer.from_pretrained(
File "/home/h3c/.local/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 814, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/home/h3c/.local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2029, in from_pretrained
return cls._from_pretrained(
File "/home/h3c/.local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2261, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/home/h3c/.local/lib/python3.10/site-packages/transformers/models/xlm_roberta/tokenization_xlm_roberta_fast.py", line 155, in init
super().init(
File "/home/h3c/.local/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 111, in init
fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 83 column 3

Jina AI org

Hey @h3clikejava , would you mind sharing the code snippet and the transformers version?

Sign up or log in to comment