Support for inference with MTP module?

#23
by yhh001 - opened

Hi, I was wondering if this model can support MTP inference? I see vLLM currently has the issue of draft tokens never being accepted using AWQ ckpt: https://github.com/vllm-project/vllm/issues/13704

Cognitive Computations org

This model doesn't have the MTP head's weights, so no. It is possible to support but this should be an upstream AWQ issue since AutoAWQ doesn't support the quantization of MTP head yet.

v2ray changed discussion status to closed

Sign up or log in to comment