Tokenizer.pad_token_id is NoneType, which requires float
Hello, I want to change the LLaVA base model from llama2 to llama3, but I encountered error during executing these lines:
input_ids = torch.nn.utils.rnn.pad_sequence(
input_ids,
batch_first=True,
padding_value=self.tokenizer.pad_token_id)
the llama3 tokenizer's pad_token_id is None, which can not be a valid input in this method. How can I resolve this problem?
The model and tokenizer is correctly loaded.
The usual trick, which also applies here, is to use the EOS token, e.g. you can apply:
tokenizer.pad_token = tokenizer.pad_token or tokenizer.eos_token
Then tokenizer.pad_token_id
will be set as eos_token_id
automatically.
thanks a lot! your ans is very helpful
The usual trick, which also applies here, is to use the EOS token, e.g. you can apply:
tokenizer.pad_token = tokenizer.pad_token or tokenizer.eos_token
Then
tokenizer.pad_token_id
will be set aseos_token_id
automatically.