morgendigital
/

Llama-2-13b-chat-german-GGUF

@@ -16,28 +16,38 @@ datasets:
 ---
 # Llama 2 13b Chat German - GGUF
-This repository contains [jphme/Llama-2-13b-chat-german](https://huggingface.co/jphme/Llama-2-13b-chat-german) in GGUF format.
 The original model was created by [jphme](https://huggingface.co/jphme) and is a fine-tune of [Llama2 13b Chat](https://huggingface.co/meta-llama/Llama-2-13b-chat) from Meta, trained on German instructions.
 ## Model Profile
-The model profile describes the properties
 |Property|Details|
 |----------------------------|--------------------------------------------------------------------------------------------------------------|
 | **Model**                  | [jphme/Llama-2-13b-chat-german](https://huggingface.co/jphme/Llama-2-13b-chat-german)                        |
 | **Format**                 | GGUF                                                                                                         |
-| **Quantization Types**     | - 8 Bit <br>- 5 Bit K_M                                                                                          |
-| **Conversion Tool**        | llama.cpp (Commit: 9e20231)                                                                                  |
 | **Original Model Creator** | [jphme](https://huggingface.co/jphme)                                                                        |
-| **Training Data**          | Prorietary German Conversation Dataset, German SQuAD, German legal SQuAD data, augmented with "wrong" contexts, to improve factual RAG |
 ## Replication Steps
-Clone and install llama.cpp *(Commit: 9e20231)* and use the provided `convert.py` file to convert the original model to GGUF with FP16 precision. The converted model will then be used to do further quantization.
 ```
-# Convert original model to GGUF format with FP16 precision
 python3 llama.cpp/convert.py ./original-models/Llama-2-13b-chat-german --outtype f16 --outfile ./converted_gguf/Llama-2-13b-chat-german-GGUF.fp16.bin
-# Quantize FP16 GGUF to 8, 5_K_M and 4_K_M bit
 ./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q8_0
 ./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q5_K_M
 ./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q4_K_M

 ---
 # Llama 2 13b Chat German - GGUF
+This repository contains the model [jphme/Llama-2-13b-chat-german](https://huggingface.co/jphme/Llama-2-13b-chat-german) in GGUF format.
 The original model was created by [jphme](https://huggingface.co/jphme) and is a fine-tune of [Llama2 13b Chat](https://huggingface.co/meta-llama/Llama-2-13b-chat) from Meta, trained on German instructions.
 ## Model Profile
+The model profile describes the properties of an AI model
 |Property|Details|
 |----------------------------|--------------------------------------------------------------------------------------------------------------|
 | **Model**                  | [jphme/Llama-2-13b-chat-german](https://huggingface.co/jphme/Llama-2-13b-chat-german)                        |
+| **Type**                   | Text Generation                                                                                              |
 | **Format**                 | GGUF                                                                                                         |
+| **Quantization Types**     | 8 Bit <br>5 Bit (K_M)                                                                                        |
+| **Conversion Tool**        | llama.cpp (Commit 9e20231)                                                                                   |
 | **Original Model Creator** | [jphme](https://huggingface.co/jphme)                                                                        |
+| **Training Data**          | Prorietary German Conversation Dataset, German SQuAD, German legal SQuAD data, augmented with "wrong" contexts, to improve factual RAG. For details see original model link. |
 ## Replication Steps
+1. Clone and install llama.cpp *(Commit: 9e20231)*.
+```
+# Install llama.cpp by cloning the repo and compiling it.
+```
+2. Use the provided `convert.py` file to convert the original model to GGUF with FP16 precision.
 ```
+# This command converts the original model to GGUF format with FP16 precision. Make sure to change the file paths and model names to your desire.
 python3 llama.cpp/convert.py ./original-models/Llama-2-13b-chat-german --outtype f16 --outfile ./converted_gguf/Llama-2-13b-chat-german-GGUF.fp16.bin
+```
+4.
+5. The converted GGUF model with FP16 precision will then be used to do further quantization to 8 Bit, 5 Bit (K_M) and 4 Bit (K_M).
+```
+# 2. Convert original model to GGUF format with FP16 precision
+python3 llama.cpp/convert.py ./original-models/Llama-2-13b-chat-german --outtype f16 --outfile ./converted_gguf/Llama-2-13b-chat-german-GGUF.fp16.bin
+# 3. Quantize GGUF (FP16) to 8, 5 (K_M) and 4 (K_M) bit
 ./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q8_0
 ./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q5_K_M
 ./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q4_K_M