freefallr commited on
Commit
fc13ba8
1 Parent(s): 3051452

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -7
README.md CHANGED
@@ -28,23 +28,20 @@ This model was created by [jphme](https://huggingface.co/jphme). It's a fine-tun
28
  | **Quantization Formats** | 8 Bit, 5 Bit (K_M) |
29
 
30
 
31
- ## Deploy from source
32
- 1. Clone and install llama.cpp *(at time of writing, we used commit 9e20231)*.
33
  ```
34
- # Install llama.cpp by cloning the repo from Github.
35
- # When cloned, then:
36
  cd llama.cpp && make
37
  ```
38
  2. Use the provided `convert.py` file to convert the original model to GGUF with FP16 precision.
39
  ```
40
- python3 llama.cpp/convert.py ./original-models/Llama-2-13b-chat-german --outtype f16 --outfile ./converted_gguf/Llama-2-13b-chat-german-GGUF.fp16.bin
41
  ```
42
  3. The converted GGUF model with FP16 precision will then be used to do further quantization to 8 Bit, 5 Bit (K_M) and 4 Bit (K_M).
43
  ```
44
- # Quantize GGUF (FP16) to 8, 5 (K_M) and 4 (K_M) bit
45
  ./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q8_0
46
  ./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q5_K_M
47
- ./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q4_K_M
48
  ```
49
  ___
50
 
 
28
  | **Quantization Formats** | 8 Bit, 5 Bit (K_M) |
29
 
30
 
31
+ ## How to quantize
32
+ 1. Clone and install llama.cpp *(at time of writing, we used commit 9e20231)*, then compile.
33
  ```
 
 
34
  cd llama.cpp && make
35
  ```
36
  2. Use the provided `convert.py` file to convert the original model to GGUF with FP16 precision.
37
  ```
38
+ python llama.cpp/convert.py ./original-models/Llama-2-13b-chat-german --outtype f16 --outfile ./converted_gguf/Llama-2-13b-chat-german-GGUF.fp16.bin
39
  ```
40
  3. The converted GGUF model with FP16 precision will then be used to do further quantization to 8 Bit, 5 Bit (K_M) and 4 Bit (K_M).
41
  ```
42
+ # Quantize GGUF (FP16) to 8 Bit and 5 Bit (K_M)
43
  ./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q8_0
44
  ./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q5_K_M
 
45
  ```
46
  ___
47