A pure C++ high-performance OpenAI LLM service powered by TensorRT-LLM and GRPS, with support for QWQ.

#22
by zhaocc1106 - opened

grps-trtllm have supported QWQ-32B. Can give it a try if you are interested.
https://github.com/NetEase-Media/grps_trtllm/blob/master/docs%2Fqwq.md

zhaocc1106 changed discussion title from A pure C++ high-performance OpenAI LLM service by TensorRT-llm + GRPS. to A pure C++ high-performance OpenAI LLM service powered by TensorRT-LLM and GRPS, with support for QWQ.

Sign up or log in to comment