Bug: with continuous batching with vllm

#6
by raghavgg - opened

I have opened a github issue ...
https://github.com/vllm-project/vllm/issues/14037

basically the model card suggests using

flash_attn==2.7.4.post1
torch==2.6.0
vllm>=0.7.2

But vllm>=0.7.2 gives

vllm 0.7.3 requires torch==2.5.1, but you have torch 2.6.0 which is incompatible.

I am getting rubbish output when doing continuous batch inference on long context inputs(4-8k) with vllm...
anyone else facing similar issue?

solved it :P
need to use
pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel -- image
first install vllm==0.7.3
and then install
pip3 install torch==2.6.0 torchvision --index-url https://download.pytorch.org/whl/test/cu124

nguyenbh changed discussion status to closed

Sign up or log in to comment