vllm support a100
#2
by
HuggingLianWang
- opened
Can this model be served directly using vllm on 8xA100(80GB)?
HuggingLianWang
changed discussion title from
vllm support
to vllm support a100
Yes but it will run with like 3.7 tokens per second.
Yes but it will run with like 3.7 tokens per second.
Thank you very much , we will try
succeed
inference speed about 3.5 tokens/s with batch size 1 on 8xA100(80GB)
There's a PR which claims to boost it to 30 tokens per second, not tried tho.
very good 3t/s 8xA100