Bug: with continuous batching with vllm

by raghavgg - opened about 5 hours ago

about 5 hours ago

I have opened a github issue ...
https://github.com/vllm-project/vllm/issues/14037

basically the model card suggests using

flash_attn==2.7.4.post1
torch==2.6.0
vllm>=0.7.2

But vllm>=0.7.2 gives

vllm 0.7.3 requires torch==2.5.1, but you have torch 2.6.0 which is incompatible.

I am getting rubbish output when doing continuous batch inference on long context inputs(4-8k) with vllm...
anyone else facing similar issue?

raghavgg

about 5 hours ago

•

edited about 4 hours ago

yet to solve...

raghavgg

about 4 hours ago

solved it :P
need to use
pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel -- image
first install vllm==0.7.3
and then install
pip3 install torch==2.6.0 torchvision --index-url https://download.pytorch.org/whl/test/cu124

nguyenbh changed discussion status to closed about 3 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment