morgendigital
/

Llama-2-13b-chat-german-GGUF

@@ -14,26 +14,22 @@ datasets:
 - Christoph911/German-legal-SQuAD
 - philschmid/test_german_squad
 ---
-# WORK IN PROGRESS
-The text below is work-in-progress and subject to change quickly!
 # Introduction
 This repository contains the model [jphme/Llama-2-13b-chat-german](https://huggingface.co/jphme/Llama-2-13b-chat-german) in GGUF format.
 This model was created by [jphme](https://huggingface.co/jphme). It's a fine-tuned variant of Meta's [Llama2 13b Chat](https://huggingface.co/meta-llama/Llama-2-13b-chat) with a compilation of multiple instruction datasets in German language.
 # Model Details
-### General Information
 |Attribute|Details|
 |----------------------------|--------------------------------------------------------------------------------------------------------------|
 | **Name**                   | [jphme/Llama-2-13b-chat-german](https://huggingface.co/jphme/Llama-2-13b-chat-german)                        |
 | **Creator**                | [jphme](https://huggingface.co/jphme)                                                                        |
-| **Source**             | https://huggingface.co/                                                                                      |
 ## Deploy from source
-1. Clone and install llama.cpp *(Commit: 9e20231)*.
 ```
 # Install llama.cpp by cloning the repo from Github.
 # When cloned, then:
@@ -41,15 +37,11 @@ cd llama.cpp && make
 ```
 2. Use the provided `convert.py` file to convert the original model to GGUF with FP16 precision.
 ```
-# This command converts the original model to GGUF format with FP16 precision. Make sure to change the file paths and model names to your desire.
 python3 llama.cpp/convert.py ./original-models/Llama-2-13b-chat-german --outtype f16 --outfile ./converted_gguf/Llama-2-13b-chat-german-GGUF.fp16.bin
 ```
 3. The converted GGUF model with FP16 precision will then be used to do further quantization to 8 Bit, 5 Bit (K_M) and 4 Bit (K_M).
 ```
-# 2. Convert original model to GGUF format with FP16 precision
-python3 llama.cpp/convert.py ./original-models/Llama-2-13b-chat-german --outtype f16 --outfile ./converted_gguf/Llama-2-13b-chat-german-GGUF.fp16.bin
-# 3. Quantize GGUF (FP16) to 8, 5 (K_M) and 4 (K_M) bit
 ./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q8_0
 ./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q5_K_M
 ./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q4_K_M

 - Christoph911/German-legal-SQuAD
 - philschmid/test_german_squad
 ---
 # Introduction
 This repository contains the model [jphme/Llama-2-13b-chat-german](https://huggingface.co/jphme/Llama-2-13b-chat-german) in GGUF format.
 This model was created by [jphme](https://huggingface.co/jphme). It's a fine-tuned variant of Meta's [Llama2 13b Chat](https://huggingface.co/meta-llama/Llama-2-13b-chat) with a compilation of multiple instruction datasets in German language.
 # Model Details
 |Attribute|Details|
 |----------------------------|--------------------------------------------------------------------------------------------------------------|
 | **Name**                   | [jphme/Llama-2-13b-chat-german](https://huggingface.co/jphme/Llama-2-13b-chat-german)                        |
 | **Creator**                | [jphme](https://huggingface.co/jphme)                                                                        |
+| **File Format**    | GGUF                                                                                     |
+| **Quantization Formats** | 8 Bit, 5 Bit (K_M) |
 ## Deploy from source
+1. Clone and install llama.cpp *(at time of writing, we used commit 9e20231)*.
 ```
 # Install llama.cpp by cloning the repo from Github.
 # When cloned, then:
 ```
 2. Use the provided `convert.py` file to convert the original model to GGUF with FP16 precision.
 ```
 python3 llama.cpp/convert.py ./original-models/Llama-2-13b-chat-german --outtype f16 --outfile ./converted_gguf/Llama-2-13b-chat-german-GGUF.fp16.bin
 ```
 3. The converted GGUF model with FP16 precision will then be used to do further quantization to 8 Bit, 5 Bit (K_M) and 4 Bit (K_M).
 ```
+# Quantize GGUF (FP16) to 8, 5 (K_M) and 4 (K_M) bit
 ./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q8_0
 ./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q5_K_M
 ./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q4_K_M