Support for inference with MTP module?
#23
by
yhh001
- opened
Hi, I was wondering if this model can support MTP inference? I see vLLM currently has the issue of draft tokens never being accepted using AWQ ckpt: https://github.com/vllm-project/vllm/issues/13704
This model doesn't have the MTP head's weights, so no. It is possible to support but this should be an upstream AWQ issue since AutoAWQ doesn't support the quantization of MTP head yet.
v2ray
changed discussion status to
closed