vllm support a100

#2
by HuggingLianWang - opened

Can this model be served directly using vllm on 8xA100(80GB)?

HuggingLianWang changed discussion title from vllm support to vllm support a100
Cognitive Computations org

Yes but it will run with like 3.7 tokens per second.

Yes but it will run with like 3.7 tokens per second.

Thank you very much , we will try

succeed
inference speed about 3.5 tokens/s with batch size 1 on 8xA100(80GB)

Cognitive Computations org

There's a PR which claims to boost it to 30 tokens per second, not tried tho.

very good 3t/s 8xA100

Sign up or log in to comment