Error loading model into Test Generation Web UI (ooba)
Trying to load the model into the Text Generation Web UI (ooba).
The weights load into the GPU properly. At the end of loading there is a crash:
Traceback (most recent call last):
File "/workspace/text-generation-webui/server.py", line 67, in load_model_wrapper
shared.model, shared.tokenizer = load_model(shared.model_name, loader)
File "/workspace/text-generation-webui/modules/models.py", line 82, in load_model
tokenizer = load_tokenizer(model_name, model)
File "/workspace/text-generation-webui/modules/models.py", line 107, in load_tokenizer
tokenizer = LlamaTokenizer.from_pretrained(Path(f"{shared.args.model_dir}/{model_name}/"), clean_up_tokenization_spaces=True)
File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 1825, in from_pretrained
return cls._from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 1988, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenization_llama.py", line 96, in __init__
self.sp_model.Load(vocab_file)
File "/usr/local/lib/python3.10/dist-packages/sentencepiece/__init__.py", line 905, in Load
return self.LoadFromFile(model_file)
File "/usr/local/lib/python3.10/dist-packages/sentencepiece/__init__.py", line 310, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
TypeError: not a string
This same instant worked with all other transformer models I tried today (Falcon-40B, MPT-30b, WizardCoder).
Found this issue in the discussion of another model:
https://huggingface.co./TheBloke/stable-vicuna-13B-GPTQ/discussions/15
Root cause is most likely that the tokenizer.model
file is missing. It is not in a downloadable files either.
It seems the system assumes to use LlamaTokenizer, but we do not use it; we instead use OpenAI Tiktoken library.
If you would like to integrate the model into text-generation-webui, please change
File "/workspace/text-generation-webui/modules/models.py", line 107, in load_tokenizer
tokenizer = LlamaTokenizer.from_pretrained(Path(f"{shared.args.model_dir}/{model_name}/"), clean_up_tokenization_spaces=True)
with AutoTokenizer
following the instruction on the model card.
Thank you very much for looking into this. I will make the above modification and try again. (I've been comparing medium-sized language models for simple coding tasks.)
Made the modification suggested above. Got this on loading the model:
Traceback (most recent call last):
File "C:\Dev\LLM\oobabooga_windows\text-generation-webui\server.py", line 1154, in <module>
shared.model, shared.tokenizer = load_model(shared.model_name)
File "C:\Dev\LLM\oobabooga_windows\text-generation-webui\modules\models.py", line 82, in load_model
tokenizer = load_tokenizer(model_name, model)
File "C:\Dev\LLM\oobabooga_windows\text-generation-webui\modules\models.py", line 107, in load_tokenizer
tokenizer = AutoTokenizer.from_pretrained(Path(f"{shared.args.model_dir}/{model_name}/"), clean_up_tokenization_spaces=True)
File "C:\Dev\LLM\oobabooga_windows\installer_files\env\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 688, in from_pretrained
raise ValueError(
ValueError: Tokenizer class XgenTokenizer does not exist or is not currently imported.
The XgenTokenizer
is defined in tokenization_xgen.py
, but somehow the AutoTokenizer
does not find it. Maybe a configuration issue?
Also tried to install the tiktoken
library into ooba's private miniconda deployment, but it did not change anything.
@viktor-ferenczi
try this: tokenizer = AutoTokenizer.from_pretrained("Salesforce/xgen-7b-8k-base", trust_remote_code=True)
Combined the above suggestions and changed line 107 of text-generation-webui\modules\models.py
to:
tokenizer = AutoTokenizer.from_pretrained(Path(f"{shared.args.model_dir}/{model_name}/"), clean_up_tokenization_spaces=True, trust_remote_code=True)
Certainly it is not secure to keep it like this, but it was enough to get the model loaded into ooba.
Thank you for all the help!
I have the same problem deploying the model (sagemaker or managed endpoints). Is there a solution for this scenario too?
It would be super nice to have some feedback on this too as it is not easy to change the code on the container images for text-generation-inference. (I would like to use this model for custom training but I need to be sure I can deploy it easily after training)
This is a log extract from AWS sagemaker after the invocation of huggingface_model.deploy()
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'XgenTokenizer'.
The class this function is called from is 'LlamaTokenizer'.
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 67, in serve
server.serve(model_id, revision, sharded, quantize, trust_remote_code, uds_path)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 155, in serve
asyncio.run(serve_inner(model_id, revision, sharded, quantize, trust_remote_code))
File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
return future.result()
Error: ShardCannotStart
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 124, in serve_inner
model = get_model(model_id, revision, sharded, quantize, trust_remote_code)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 246, in get_model
return llama_cls(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_llama.py", line 44, in __init__
tokenizer = LlamaTokenizer.from_pretrained(
File "/usr/src/transformers/src/transformers/tokenization_utils_base.py", line 1812, in from_pretrained
return cls._from_pretrained(
File "/usr/src/transformers/src/transformers/tokenization_utils_base.py", line 1975, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/usr/src/transformers/src/transformers/models/llama/tokenization_llama.py", line 96, in __init__
self.sp_model.Load(vocab_file)
File "/opt/conda/lib/python3.9/site-packages/sentencepiece/__init__.py", line 905, in Load
return self.LoadFromFile(model_file)
File "/opt/conda/lib/python3.9/site-packages/sentencepiece/__init__.py", line 310, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
Any help will be very appreciated :)