Can this model be run with vllm ?
#3
by
just1nseo
- opened
Thanks for the great work!
pip install openai vllm
Launch vllm:vllm serve "open-thoughts/OpenThinker-32B" --tensor-parallel-size=1 --disable-log-requests --enable-chunked-prefill --enable-prefix-caching --max-num-batched-tokens=16192 --max-model-len=8096 --gpu-memory-utilization=0.93
(you may want to play with the args here)
And you can access it as follows:
import openai
client = openai.OpenAI(
base_url="http://localhost:8000/v1",
api_key="token-abc123",
)
completion = client.chat.completions.create(
model="open-thoughts/OpenThinker-32B",
messages=[{"role": "user", "content": "What is 1+1?"}]
)