shimmyshimmer commited on
Commit
181670a
·
verified ·
1 Parent(s): 37354dc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -20,7 +20,7 @@ Or you can view more detailed instructions here: [unsloth.ai/blog/deepseek-r1](h
20
  3. Example with Q8_0 K quantized cache **Notice -no-cnv disables auto conversation mode**
21
  ```bash
22
  ./llama.cpp/llama-cli \
23
- --model unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF/DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf \
24
  --cache-type-k q8_0 \
25
  --threads 16 \
26
  --prompt '<|User|>What is 1+1?<|Assistant|>' \
@@ -40,7 +40,7 @@ Or you can view more detailed instructions here: [unsloth.ai/blog/deepseek-r1](h
40
  4. If you have a GPU (RTX 4090 for example) with 24GB, you can offload multiple layers to the GPU for faster processing. If you have multiple GPUs, you can probably offload more layers.
41
  ```bash
42
  ./llama.cpp/llama-cli \
43
- --model unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF/DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf
44
  --cache-type-k q8_0
45
  --threads 16
46
  --prompt '<|User|>What is 1+1?<|Assistant|>'
 
20
  3. Example with Q8_0 K quantized cache **Notice -no-cnv disables auto conversation mode**
21
  ```bash
22
  ./llama.cpp/llama-cli \
23
+ --model unsloth/DeepSeek-R1-Distill-Qwen-1.5B-GGUF/DeepSeek-R1-Distill-Qwen-1.5B-Q4_K_M.gguf \
24
  --cache-type-k q8_0 \
25
  --threads 16 \
26
  --prompt '<|User|>What is 1+1?<|Assistant|>' \
 
40
  4. If you have a GPU (RTX 4090 for example) with 24GB, you can offload multiple layers to the GPU for faster processing. If you have multiple GPUs, you can probably offload more layers.
41
  ```bash
42
  ./llama.cpp/llama-cli \
43
+ --model unsloth/DeepSeek-R1-Distill-Qwen-1.5B-GGUF/DeepSeek-R1-Distill-Qwen-1.5B-Q4_K_M.gguf
44
  --cache-type-k q8_0
45
  --threads 16
46
  --prompt '<|User|>What is 1+1?<|Assistant|>'