morgendigital
/

Llama-2-13b-chat-german-GGUF

Text Generation

Model card Files Files and versions Community

freefallr commited on Sep 6, 2023

Commit

2b63b4c

•

1 Parent(s): e1606d4

Update README.md

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -30,6 +30,7 @@ This model was created by [jphme](https://huggingface.co/jphme). It's a fine-tun
 ## Quantization Process
 If you want to create your own GGUF quantizations of HuggingFace models, use llama.cpp.
 1. Clone and install llama.cpp *(at time of writing, we used commit 9e20231)*, then compile.
 ```
 cd llama.cpp && make
@@ -38,7 +39,7 @@ cd llama.cpp && make
 ```
 python llama.cpp/convert.py ./original-models/Llama-2-13b-chat-german --outtype f16 --outfile ./converted_gguf/Llama-2-13b-chat-german-GGUF.fp16.bin
 ```
-3. The converted GGUF model with FP16 precision will then be used to do further quantization to 8 Bit, 5 Bit (K_M) and 4 Bit (K_M).
 ```
 # Quantize GGUF (FP16) to 8 Bit and 5 Bit (K_M)
 ./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q8_0

 ## Quantization Process
 If you want to create your own GGUF quantizations of HuggingFace models, use llama.cpp.
 1. Clone and install llama.cpp *(at time of writing, we used commit 9e20231)*, then compile.
 ```
 cd llama.cpp && make
 ```
 python llama.cpp/convert.py ./original-models/Llama-2-13b-chat-german --outtype f16 --outfile ./converted_gguf/Llama-2-13b-chat-german-GGUF.fp16.bin
 ```
+3. The converted GGUF model with FP16 precision will then be used to do further quantization to 8 Bit and 5 Bit (K_M).
 ```
 # Quantize GGUF (FP16) to 8 Bit and 5 Bit (K_M)
 ./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q8_0