Suggested tokenizer changes similar to Phi-4

#8
by l2dy - opened

tokenizer_config.json for Phi-4-mini-instruct also contains the following,

  "bos_token": "<|endoftext|>",
  "eos_token": "<|endoftext|>",
  "pad_token": "<|endoftext|>",

which was changed in Phi-4 to use different strings. https://huggingface.co./microsoft/phi-4/commit/6fbb3d3bbe726c99b4188087b4deeec1bceac5ae

Does it make sense to apply similar changes to Phi-4-mini-instruct?

Sign up or log in to comment