danielhanchen commited on
Commit
24f4fc7
·
verified ·
1 Parent(s): 9123ce8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -20,7 +20,7 @@ Or you can view more detailed instructions here: [unsloth.ai/blog/deepseek-r1](h
20
  2. Example with K & V quantized cache **Notice -no-cnv disables auto conversation mode**
21
  ```bash
22
  ./llama.cpp/llama-cli \
23
- --model unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF/DeepSeek-R1-Distill-Llama-8B-Q2_K_L.gguf \
24
  --cache-type-k q8_0 \
25
  --cache-type-v q8_0 \
26
  --threads 16 \
@@ -40,7 +40,7 @@ Or you can view more detailed instructions here: [unsloth.ai/blog/deepseek-r1](h
40
  4. If you have a GPU (RTX 4090 for example) with 24GB, you can offload multiple layers to the GPU for faster processing. If you have multiple GPUs, you can probably offload more layers.
41
  ```bash
42
  ./llama.cpp/llama-cli \
43
- --model unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF/DeepSeek-R1-Distill-Llama-8B-Q2_K_L.gguf
44
  --cache-type-k q8_0
45
  --cache-type-v q8_0
46
  --threads 16
 
20
  2. Example with K & V quantized cache **Notice -no-cnv disables auto conversation mode**
21
  ```bash
22
  ./llama.cpp/llama-cli \
23
+ --model unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF/DeepSeek-R1-Distill-Llama-8B-Q4_K_M.gguf \
24
  --cache-type-k q8_0 \
25
  --cache-type-v q8_0 \
26
  --threads 16 \
 
40
  4. If you have a GPU (RTX 4090 for example) with 24GB, you can offload multiple layers to the GPU for faster processing. If you have multiple GPUs, you can probably offload more layers.
41
  ```bash
42
  ./llama.cpp/llama-cli \
43
+ --model unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF/DeepSeek-R1-Distill-Llama-8B-Q4_K_M.gguf
44
  --cache-type-k q8_0
45
  --cache-type-v q8_0
46
  --threads 16