Support for inference with MTP module?

#23

by yhh001 - opened 6 days ago

6 days ago

Hi, I was wondering if this model can support MTP inference? I see vLLM currently has the issue of draft tokens never being accepted using AWQ ckpt: https://github.com/vllm-project/vllm/issues/13704

v2ray

Cognitive Computations org 6 days ago

This model doesn't have the MTP head's weights, so no. It is possible to support but this should be an upstream AWQ issue since AutoAWQ doesn't support the quantization of MTP head yet.

v2ray changed discussion status to closed 6 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment