Special tokens don't match between draft model and target model

#1
by Nindaleth - opened

Hi, thanks for making this model! I'd love to compare it to the non-fine-tuned Qwen2.5-Coder-0.5B, but unfortunately llama.cpp won't accept it as a matching draft model:

common_speculative_are_compatible: draft model special tokens must match target model to use speculation
common_speculative_are_compatible: tgt: bos = 151643 (0), eos = 151645 (0)
common_speculative_are_compatible: dft: bos = 11 (0), eos = 151645 (0)
srv    load_model: the draft model './models/Qwen2.5-Coder-0.5B-QwQ-draft.Q8_0.gguf' is not compatible with the target model './models/QwQ-32B-Preview-Q4_K_M.gguf'

I'm using QwQ quantized to Q4_K_M from this HF repo: bartowski/QwQ-32B-Preview-GGUF
And I'm using this model quantized to Q8_0 from this HF repo: MaziyarPanahi/Qwen2.5-Coder-0.5B-QwQ-draft-GGUF

My understanding is that quantization changes many things, but the token IDs should stay identical to the original model. Is this something that can be changed in your model?

Qwen models don't have any bos_token, probably quantatizer/GGUF converter is setting something to bos_token.
In the above, 151643 should be <|endoftext|> and 11 is ,. Probably, you should regenerate again GGUF and set <|endoftext|> as bos token.

tugstugi changed discussion status to closed

OK thanks, I'll ask the GGUF quant author if he can update his repo. I appreciate the speedy answer.

Sign up or log in to comment