Bug: with continuous batching with vllm
#6
by
raghavgg
- opened
I have opened a github issue ...
https://github.com/vllm-project/vllm/issues/14037
basically the model card suggests using
flash_attn==2.7.4.post1
torch==2.6.0
vllm>=0.7.2
But vllm>=0.7.2 gives
vllm 0.7.3 requires torch==2.5.1, but you have torch 2.6.0 which is incompatible.
I am getting rubbish output when doing continuous batch inference on long context inputs(4-8k) with vllm...
anyone else facing similar issue?
yet to solve...
solved it :P
need to use
pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel -- image
first install vllm==0.7.3
and then install
pip3 install torch==2.6.0 torchvision --index-url https://download.pytorch.org/whl/test/cu124
nguyenbh
changed discussion status to
closed