Error loading model into Test Generation Web UI (ooba)

#11
by viktor-ferenczi - opened

Trying to load the model into the Text Generation Web UI (ooba).

The weights load into the GPU properly. At the end of loading there is a crash:

Traceback (most recent call last):
  File "/workspace/text-generation-webui/server.py", line 67, in load_model_wrapper
    shared.model, shared.tokenizer = load_model(shared.model_name, loader)
  File "/workspace/text-generation-webui/modules/models.py", line 82, in load_model
    tokenizer = load_tokenizer(model_name, model)
  File "/workspace/text-generation-webui/modules/models.py", line 107, in load_tokenizer
    tokenizer = LlamaTokenizer.from_pretrained(Path(f"{shared.args.model_dir}/{model_name}/"), clean_up_tokenization_spaces=True)
  File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 1825, in from_pretrained
    return cls._from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 1988, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenization_llama.py", line 96, in __init__
    self.sp_model.Load(vocab_file)
  File "/usr/local/lib/python3.10/dist-packages/sentencepiece/__init__.py", line 905, in Load
    return self.LoadFromFile(model_file)
  File "/usr/local/lib/python3.10/dist-packages/sentencepiece/__init__.py", line 310, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
TypeError: not a string

This same instant worked with all other transformer models I tried today (Falcon-40B, MPT-30b, WizardCoder).

Found this issue in the discussion of another model:
https://huggingface.co./TheBloke/stable-vicuna-13B-GPTQ/discussions/15

Root cause is most likely that the tokenizer.model file is missing. It is not in a downloadable files either.

Salesforce org

It seems the system assumes to use LlamaTokenizer, but we do not use it; we instead use OpenAI Tiktoken library.
If you would like to integrate the model into text-generation-webui, please change

 File "/workspace/text-generation-webui/modules/models.py", line 107, in load_tokenizer
   tokenizer = LlamaTokenizer.from_pretrained(Path(f"{shared.args.model_dir}/{model_name}/"), clean_up_tokenization_spaces=True)

with AutoTokenizer following the instruction on the model card.

Thank you very much for looking into this. I will make the above modification and try again. (I've been comparing medium-sized language models for simple coding tasks.)

Made the modification suggested above. Got this on loading the model:

Traceback (most recent call last):
  File "C:\Dev\LLM\oobabooga_windows\text-generation-webui\server.py", line 1154, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "C:\Dev\LLM\oobabooga_windows\text-generation-webui\modules\models.py", line 82, in load_model
    tokenizer = load_tokenizer(model_name, model)
  File "C:\Dev\LLM\oobabooga_windows\text-generation-webui\modules\models.py", line 107, in load_tokenizer
    tokenizer = AutoTokenizer.from_pretrained(Path(f"{shared.args.model_dir}/{model_name}/"), clean_up_tokenization_spaces=True)
  File "C:\Dev\LLM\oobabooga_windows\installer_files\env\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 688, in from_pretrained
    raise ValueError(
ValueError: Tokenizer class XgenTokenizer does not exist or is not currently imported.

The XgenTokenizer is defined in tokenization_xgen.py, but somehow the AutoTokenizer does not find it. Maybe a configuration issue?

Also tried to install the tiktoken library into ooba's private miniconda deployment, but it did not change anything.

@viktor-ferenczi try this: tokenizer = AutoTokenizer.from_pretrained("Salesforce/xgen-7b-8k-base", trust_remote_code=True)

Combined the above suggestions and changed line 107 of text-generation-webui\modules\models.py to:

tokenizer = AutoTokenizer.from_pretrained(Path(f"{shared.args.model_dir}/{model_name}/"), clean_up_tokenization_spaces=True, trust_remote_code=True)

Certainly it is not secure to keep it like this, but it was enough to get the model loaded into ooba.

Thank you for all the help!

viktor-ferenczi changed discussion title from Error loading model to Error loading model into Test Generation Web UI (ooba)

I have the same problem deploying the model (sagemaker or managed endpoints). Is there a solution for this scenario too?

It would be super nice to have some feedback on this too as it is not easy to change the code on the container images for text-generation-inference. (I would like to use this model for custom training but I need to be sure I can deploy it easily after training)

This is a log extract from AWS sagemaker after the invocation of huggingface_model.deploy()

The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.

The tokenizer class you load from this checkpoint is 'XgenTokenizer'. 

The class this function is called from is 'LlamaTokenizer'.

Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 67, in serve
    server.serve(model_id, revision, sharded, quantize, trust_remote_code, uds_path)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 155, in serve
    asyncio.run(serve_inner(model_id, revision, sharded, quantize, trust_remote_code))
  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()


Error: ShardCannotStart
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 124, in serve_inner
    model = get_model(model_id, revision, sharded, quantize, trust_remote_code)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 246, in get_model
    return llama_cls(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_llama.py", line 44, in __init__
    tokenizer = LlamaTokenizer.from_pretrained(
  File "/usr/src/transformers/src/transformers/tokenization_utils_base.py", line 1812, in from_pretrained
    return cls._from_pretrained(
  File "/usr/src/transformers/src/transformers/tokenization_utils_base.py", line 1975, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/usr/src/transformers/src/transformers/models/llama/tokenization_llama.py", line 96, in __init__
    self.sp_model.Load(vocab_file)
  File "/opt/conda/lib/python3.9/site-packages/sentencepiece/__init__.py", line 905, in Load
    return self.LoadFromFile(model_file)
  File "/opt/conda/lib/python3.9/site-packages/sentencepiece/__init__.py", line 310, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)

Any help will be very appreciated :)

Sign up or log in to comment