morgendigital
/

Llama-2-13b-chat-german-GGUF

Text Generation

Model card Files Files and versions Community

freefallr commited on Sep 6, 2023

Commit

fc13ba8

•

1 Parent(s): 3051452

Update README.md

Files changed (1) hide show

README.md +4 -7

README.md CHANGED Viewed

@@ -28,23 +28,20 @@ This model was created by [jphme](https://huggingface.co/jphme). It's a fine-tun
 | **Quantization Formats** | 8 Bit, 5 Bit (K_M) |
-## Deploy from source
-1. Clone and install llama.cpp *(at time of writing, we used commit 9e20231)*.
 ```
-# Install llama.cpp by cloning the repo from Github.
-# When cloned, then:
 cd llama.cpp && make
 ```
 2. Use the provided `convert.py` file to convert the original model to GGUF with FP16 precision.
 ```
-python3 llama.cpp/convert.py ./original-models/Llama-2-13b-chat-german --outtype f16 --outfile ./converted_gguf/Llama-2-13b-chat-german-GGUF.fp16.bin
 ```
 3. The converted GGUF model with FP16 precision will then be used to do further quantization to 8 Bit, 5 Bit (K_M) and 4 Bit (K_M).
 ```
-# Quantize GGUF (FP16) to 8, 5 (K_M) and 4 (K_M) bit
 ./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q8_0
 ./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q5_K_M
-./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q4_K_M
 ```
 ___

 | **Quantization Formats** | 8 Bit, 5 Bit (K_M) |
+## How to quantize
+1. Clone and install llama.cpp *(at time of writing, we used commit 9e20231)*, then compile.
 ```
 cd llama.cpp && make
 ```
 2. Use the provided `convert.py` file to convert the original model to GGUF with FP16 precision.
 ```
+python llama.cpp/convert.py ./original-models/Llama-2-13b-chat-german --outtype f16 --outfile ./converted_gguf/Llama-2-13b-chat-german-GGUF.fp16.bin
 ```
 3. The converted GGUF model with FP16 precision will then be used to do further quantization to 8 Bit, 5 Bit (K_M) and 4 Bit (K_M).
 ```
+# Quantize GGUF (FP16) to 8 Bit and 5 Bit (K_M)
 ./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q8_0
 ./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q5_K_M
 ```
 ___