Vocab size mismatch?

#3
by DaTruAndi - opened

I xored-checked the md5, and then when I was running a ggml conversion script and stumbled over this:

Vocab size mismatch (model has 32016, but /AI/text-generation-webui/models/oasst-sft-7-llama-30b/tokenizer.model combined with /AI/text-generation-webui/models/oasst-sft-7-llama-30b/added_tokens.json has 32005).

Wondering if this is a problem in the conversion script or if there is really a mismatch in the vocab size?

I xored-checked the md5, and then when I was running a ggml conversion script and stumbled over this:

Vocab size mismatch (model has 32016, but /AI/text-generation-webui/models/oasst-sft-7-llama-30b/tokenizer.model combined with /AI/text-generation-webui/models/oasst-sft-7-llama-30b/added_tokens.json has 32005).

Wondering if this is a problem in the conversion script or if there is really a mismatch in the vocab size?

You can fix it with the solution at https://huggingface.co./OpenAssistant/oasst-sft-6-llama-30b-xor/discussions/2

Sign up or log in to comment