Text Generation
Transformers
English
llama
Inference Endpoints

Do I just copy over the config, tokenizer_config, and tokenizer.model from another 13B Vicuna Model to get this working in ooba? I am getting error.

#17
by Goldenblood56 - opened

Do I just copy over the config, tokenizer_config, and tokenizer.model from another 13B Vicuna Model to get this working in in ooba?
I think that's what I did before. I take the "vicuna-13b-free-V4.3-4bit-128g" put it in a folder. And then I copied the weights or whatever they are called over from my other model "vicuna-13b-GPTQ-4bit-128g" and it worked fine.

If I'm not suppose to do this and need a more specific config, tokenizer_config, and tokenizer.model please let me know. And I think those are the only three files I need? Thank you for all the hard work.

Wait never mind maybe there is something else I need to do? I am getting an error. May I need newer weights for this 1.1 model or something?
My arguments are "call python server.py --auto-devices --chat --model vicuna-13b-free_4.3 --wbits 4 --groupsize 128"

Starting the web UI...
Gradio HTTP request redirected to localhost :)
Loading vicuna-13b-free_4.3...
Found the following quantized model: models\vicuna-13b-free_4.3\vicuna-13b-free-V4.3-4bit-128g.safetensors
Loading model ...
Traceback (most recent call last):
File "C:\AI\oobabooga-windowsBest\text-generation-webui\server.py", line 914, in
shared.model, shared.tokenizer = load_model(shared.model_name)
File "C:\AI\oobabooga-windowsBest\text-generation-webui\modules\models.py", line 158, in load_model
model = load_quantized(model_name)
File "C:\AI\oobabooga-windowsBest\text-generation-webui\modules\GPTQ_loader.py", line 176, in load_quantized
model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, kernel_switch_threshold=threshold)
File "C:\AI\oobabooga-windowsBest\text-generation-webui\modules\GPTQ_loader.py", line 77, in _load_quant
model.load_state_dict(safe_load(checkpoint), strict=False)
File "C:\AI\oobabooga-windowsBest\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 2041, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:
size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([32000, 5120]) from checkpoint, the shape in current model is torch.Size([32001, 5120]).
size mismatch for lm_head.weight: copying a param with shape torch.Size([32000, 5120]) from checkpoint, the shape in current model is torch.Size([32001, 5120]).
Press any key to continue . . .

Goldenblood56 changed discussion title from Do I just copy over the config, tokenizer_config, and tokenizer.model from another 13B Vicuna Model to get this working in in ooba? to Do I just copy over the config, tokenizer_config, and tokenizer.model from another 13B Vicuna Model to get this working in ooba?
Goldenblood56 changed discussion title from Do I just copy over the config, tokenizer_config, and tokenizer.model from another 13B Vicuna Model to get this working in ooba? to Do I just copy over the config, tokenizer_config, and tokenizer.model from another 13B Vicuna Model to get this working in ooba? I am getting error.

I will upload the tokenizer and config files soon. Generally it has worked if one takes them from some other Vicuna 13B, but just in case and for convenience I'll include them in this repository as well.

Thank you Reeducator that will rule out chances of someone making a mistake and using the wrong ones. I think something in my config file might be causing this error. But I will wait until you post what is needed and see if my problem goes away.

I'm getting the same error too, and I used previous files that worked with the v1.0 of this model.

Ok, it does work using another similar models files.

I'm getting the same error. Ill try the new files once they upload. Thanks for all your hard work!

just use the files from another model, I used these https://huggingface.co./TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g/tree/main

just use the files from another model, I used these https://huggingface.co./TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g/tree/main

That worked thank you.

I added the tokenizer and config to hf-output directory. That should work hopefully.

I added the tokenizer and config to hf-output directory. That should work hopefully.

config.json is needed as well.

I added the tokenizer and config to hf-output directory. That should work hopefully.

config.json is needed as well.

Yeah, added.

Sign up or log in to comment