Update README.md
Browse files
README.md
CHANGED
@@ -28,23 +28,20 @@ This model was created by [jphme](https://huggingface.co/jphme). It's a fine-tun
|
|
28 |
| **Quantization Formats** | 8 Bit, 5 Bit (K_M) |
|
29 |
|
30 |
|
31 |
-
##
|
32 |
-
1. Clone and install llama.cpp *(at time of writing, we used commit 9e20231)
|
33 |
```
|
34 |
-
# Install llama.cpp by cloning the repo from Github.
|
35 |
-
# When cloned, then:
|
36 |
cd llama.cpp && make
|
37 |
```
|
38 |
2. Use the provided `convert.py` file to convert the original model to GGUF with FP16 precision.
|
39 |
```
|
40 |
-
|
41 |
```
|
42 |
3. The converted GGUF model with FP16 precision will then be used to do further quantization to 8 Bit, 5 Bit (K_M) and 4 Bit (K_M).
|
43 |
```
|
44 |
-
# Quantize GGUF (FP16) to 8
|
45 |
./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q8_0
|
46 |
./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q5_K_M
|
47 |
-
./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q4_K_M
|
48 |
```
|
49 |
___
|
50 |
|
|
|
28 |
| **Quantization Formats** | 8 Bit, 5 Bit (K_M) |
|
29 |
|
30 |
|
31 |
+
## How to quantize
|
32 |
+
1. Clone and install llama.cpp *(at time of writing, we used commit 9e20231)*, then compile.
|
33 |
```
|
|
|
|
|
34 |
cd llama.cpp && make
|
35 |
```
|
36 |
2. Use the provided `convert.py` file to convert the original model to GGUF with FP16 precision.
|
37 |
```
|
38 |
+
python llama.cpp/convert.py ./original-models/Llama-2-13b-chat-german --outtype f16 --outfile ./converted_gguf/Llama-2-13b-chat-german-GGUF.fp16.bin
|
39 |
```
|
40 |
3. The converted GGUF model with FP16 precision will then be used to do further quantization to 8 Bit, 5 Bit (K_M) and 4 Bit (K_M).
|
41 |
```
|
42 |
+
# Quantize GGUF (FP16) to 8 Bit and 5 Bit (K_M)
|
43 |
./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q8_0
|
44 |
./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q5_K_M
|
|
|
45 |
```
|
46 |
___
|
47 |
|