Doesn't Generate `<think>` tags

#25
by bingw5 - opened

The response doesn't contain <think> and </think> tags, only </details>. Is this by design?

Sure it does. Use llama.cpp. Command line prompt along these lines depending on your system/os: .\llama-cli --model QwQ-32B-Q8_0.gguf --temp 0.0 --color --threads 36 --ctx-size 128000

Are you using Open-WebUI by any chance? When using it with SillyTavern, it produces <think> tags for me just fine. I suggest trying QwQ-32B 8bpw EXL2 quant with TabbyAPI, using DeepSeek-R1-Distill-Qwen-1.5B-4bpw-exl2 as a draft model for speculative decoding, for the best speed and quality.

Sign up or log in to comment