Tokenizer mismatch all the time

#47
by tian9 - opened

Hello, I want to change the LLAVA's base model from llama2 to llama3 and I encountered this problem during fine-tuning the pretrained model:

image.png
Why all the the tokenization become 1? What's wrong with this?

is it related?

The original Llama 3 8b (base) special token weights are zero, which might cause NaN gradients. This version re-initialized the weights of all the following special tokens to alleviate the problem.
https://huggingface.co./imone/Llama-3-8B-fixed-special-embedding

Meta Llama org

I have no idea what your tokenization missmatch is, but make sure that the tokenizer you are using is of the PreTrainedTokenizerFast class, not the LlamaTokenizerFast.
It should be completely possible otherwise!

Sign up or log in to comment