The quant doesn't load in vLLM / Aphrodite Engine
#1
by
av-codes
- opened
Unable to run this in either vLLM or Aphrodite. vLLM silently fails with a missing response from RPC engine, Aphrodite get stuck at aqlm_dequant. I assume both are silent errors in the underlying quantization library.
The 3.1 8B 1x16 loads in the same setup with vLLM, after fixing tokenizer config (correct EOS token + add missing chat template).