freefallr commited on
Commit
2b63b4c
1 Parent(s): e1606d4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -30,6 +30,7 @@ This model was created by [jphme](https://huggingface.co/jphme). It's a fine-tun
30
 
31
  ## Quantization Process
32
  If you want to create your own GGUF quantizations of HuggingFace models, use llama.cpp.
 
33
  1. Clone and install llama.cpp *(at time of writing, we used commit 9e20231)*, then compile.
34
  ```
35
  cd llama.cpp && make
@@ -38,7 +39,7 @@ cd llama.cpp && make
38
  ```
39
  python llama.cpp/convert.py ./original-models/Llama-2-13b-chat-german --outtype f16 --outfile ./converted_gguf/Llama-2-13b-chat-german-GGUF.fp16.bin
40
  ```
41
- 3. The converted GGUF model with FP16 precision will then be used to do further quantization to 8 Bit, 5 Bit (K_M) and 4 Bit (K_M).
42
  ```
43
  # Quantize GGUF (FP16) to 8 Bit and 5 Bit (K_M)
44
  ./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q8_0
 
30
 
31
  ## Quantization Process
32
  If you want to create your own GGUF quantizations of HuggingFace models, use llama.cpp.
33
+
34
  1. Clone and install llama.cpp *(at time of writing, we used commit 9e20231)*, then compile.
35
  ```
36
  cd llama.cpp && make
 
39
  ```
40
  python llama.cpp/convert.py ./original-models/Llama-2-13b-chat-german --outtype f16 --outfile ./converted_gguf/Llama-2-13b-chat-german-GGUF.fp16.bin
41
  ```
42
+ 3. The converted GGUF model with FP16 precision will then be used to do further quantization to 8 Bit and 5 Bit (K_M).
43
  ```
44
  # Quantize GGUF (FP16) to 8 Bit and 5 Bit (K_M)
45
  ./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q8_0