Unable to use fp8 kv cache with neuralmagic quants on ampere

#3
by ndurkee - opened

Has anyone found any workarounds for this?

https://github.com/vllm-project/vllm/issues/7714

Sign up or log in to comment