Error during quantization
Just a FYI (I'm aware you made a GGML available yourself)
Exception: Vocab size mismatch (model has 32032, but I:\HF\Storage\NousResearch_Nous-Hermes-Llama2-13b\tokenizer.model combined with I:\HF\Storage\NousResearch_Nous-Hermes-Llama2-13b\added_tokens.json has 32001).
Same finding here.
Also when I attempted quantization from the provided ggml fp16, I'm getting notified that certain tensors aren't k-quant compatible due to dimensions not being a multiple of 256 - presumably also related to the vocab changes.
Yup, doesn't seem to work with 4 bit or 8 bit quantization offered through bitsandbytes
BnB on newer transformers can be fixed with pretraining_tp": 1 in the config file
Same problem here.
config.json says "vocab_size": 32032
while largest id in tokenizer.json is 32000
Does anyone know how to solve this?
Same problem here.
config.json says "vocab_size": 32032
while largest id in tokenizer.json is 32000Does anyone know how to solve this?
You can add 32 dummy tokens to added_tokens.json to make it match the tensor size. Not sure the reason it's set up like this.
BnB on newer transformers can be fixed with pretraining_tp": 1 in the config file
this is the real fix. its an issue on behalf of huggingface and broke lots of the llama 2 finetunes dropped that day.
fix has been pushed on the model if you wanna just download the new config.json
if still having issues can do the dummy token thing but not recommended
I upgraded transformers and bitsandbytes to latest versions, but I am still getting vocab size mismatch when trying to run convert.py in llama.cpp. What am I missing?
Only solution I could find was to add a bunch of dummy tokens to add_tokens.json, which works, but seems like a dumb fix that could lead to issues. Better than nothing, I guess.
Only solution I could find was to add a bunch of dummy tokens to add_tokens.json, which works, but seems like a dumb fix that could lead to issues. Better than nothing, I guess.
Please tell me .How do I add a bunch of dummy tokens?
Only solution I could find was to add a bunch of dummy tokens to add_tokens.json, which works, but seems like a dumb fix that could lead to issues. Better than nothing, I guess.
Please tell me .How do I add a bunch of dummy tokens?
this is my added_tokens.json file with dummy tokens to make it total of 32032 tokens:
{"<pad>": 32000, "<pad1>": 32001, "<pad2>": 32002, "<pad3>": 32003, "<pad4>": 32004, "<pad5>": 32005, "<pad6>": 32006, "<pad7>": 32007, "<pad8>": 32008, "<pad9>": 32009, "<pad10>": 32010, "<pad11>": 32011, "<pad12>": 32012, "<pad13>": 32013, "<pad14>": 32014, "<pad15>": 32015, "<pad16>": 32016, "<pad17>": 32017, "<pad18>": 32018, "<pad19>": 32019, "<pad20>": 32020, "<pad21>": 32021, "<pad22>": 32022, "<pad23>": 32023, "<pad24>": 32024, "<pad25>": 32025, "<pad26>": 32026, "<pad27>": 32027, "<pad28>": 32028, "<pad29>": 32029, "<pad30>": 32030,"<pad31>": 32031}```
Same problem here.
config.json says "vocab_size": 32032
while largest id in tokenizer.json is 32000Does anyone know how to solve this?
You can add 32 dummy tokens to added_tokens.json to make it match the tensor size. Not sure the reason it's set up like this.
Seems it was the trainer we used, axolotl. It has been fixed in the trainer but still dont know how to fix it here
Python script to generate valid tokenizer.model:
from pathlib import Path
from datasets import load_dataset
from transformers import AutoTokenizer
tokenizer_model_name = 'NousResearch/Llama-2-7b-hf'
model_path = 'output'
new_tokens = [f"<pad{i}>" for i in range(31)]
tokenizer = AutoTokenizer.from_pretrained(tokenizer_model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
tokenizer.add_tokens(new_tokens)
tokenizer.save_pretrained(Path(model_path))
tokenizer.save_vocabulary(model_path)