freefallr commited on
Commit
da28da9
1 Parent(s): 61d46ef

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -9
README.md CHANGED
@@ -16,28 +16,38 @@ datasets:
16
  ---
17
  # Llama 2 13b Chat German - GGUF
18
 
19
- This repository contains [jphme/Llama-2-13b-chat-german](https://huggingface.co/jphme/Llama-2-13b-chat-german) in GGUF format.
20
  The original model was created by [jphme](https://huggingface.co/jphme) and is a fine-tune of [Llama2 13b Chat](https://huggingface.co/meta-llama/Llama-2-13b-chat) from Meta, trained on German instructions.
21
 
22
-
23
  ## Model Profile
24
- The model profile describes the properties
25
  |Property|Details|
26
  |----------------------------|--------------------------------------------------------------------------------------------------------------|
27
  | **Model** | [jphme/Llama-2-13b-chat-german](https://huggingface.co/jphme/Llama-2-13b-chat-german) |
 
28
  | **Format** | GGUF |
29
- | **Quantization Types** | - 8 Bit <br>- 5 Bit K_M |
30
- | **Conversion Tool** | llama.cpp (Commit: 9e20231) |
31
  | **Original Model Creator** | [jphme](https://huggingface.co/jphme) |
32
- | **Training Data** | Prorietary German Conversation Dataset, German SQuAD, German legal SQuAD data, augmented with "wrong" contexts, to improve factual RAG |
33
 
34
  ## Replication Steps
35
- Clone and install llama.cpp *(Commit: 9e20231)* and use the provided `convert.py` file to convert the original model to GGUF with FP16 precision. The converted model will then be used to do further quantization.
 
 
 
 
36
  ```
37
- # Convert original model to GGUF format with FP16 precision
38
  python3 llama.cpp/convert.py ./original-models/Llama-2-13b-chat-german --outtype f16 --outfile ./converted_gguf/Llama-2-13b-chat-german-GGUF.fp16.bin
 
 
 
39
 
40
- # Quantize FP16 GGUF to 8, 5_K_M and 4_K_M bit
 
 
 
41
  ./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q8_0
42
  ./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q5_K_M
43
  ./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q4_K_M
 
16
  ---
17
  # Llama 2 13b Chat German - GGUF
18
 
19
+ This repository contains the model [jphme/Llama-2-13b-chat-german](https://huggingface.co/jphme/Llama-2-13b-chat-german) in GGUF format.
20
  The original model was created by [jphme](https://huggingface.co/jphme) and is a fine-tune of [Llama2 13b Chat](https://huggingface.co/meta-llama/Llama-2-13b-chat) from Meta, trained on German instructions.
21
 
 
22
  ## Model Profile
23
+ The model profile describes the properties of an AI model
24
  |Property|Details|
25
  |----------------------------|--------------------------------------------------------------------------------------------------------------|
26
  | **Model** | [jphme/Llama-2-13b-chat-german](https://huggingface.co/jphme/Llama-2-13b-chat-german) |
27
+ | **Type** | Text Generation |
28
  | **Format** | GGUF |
29
+ | **Quantization Types** | 8 Bit <br>5 Bit (K_M) |
30
+ | **Conversion Tool** | llama.cpp (Commit 9e20231) |
31
  | **Original Model Creator** | [jphme](https://huggingface.co/jphme) |
32
+ | **Training Data** | Prorietary German Conversation Dataset, German SQuAD, German legal SQuAD data, augmented with "wrong" contexts, to improve factual RAG. For details see original model link. |
33
 
34
  ## Replication Steps
35
+ 1. Clone and install llama.cpp *(Commit: 9e20231)*.
36
+ ```
37
+ # Install llama.cpp by cloning the repo and compiling it.
38
+ ```
39
+ 2. Use the provided `convert.py` file to convert the original model to GGUF with FP16 precision.
40
  ```
41
+ # This command converts the original model to GGUF format with FP16 precision. Make sure to change the file paths and model names to your desire.
42
  python3 llama.cpp/convert.py ./original-models/Llama-2-13b-chat-german --outtype f16 --outfile ./converted_gguf/Llama-2-13b-chat-german-GGUF.fp16.bin
43
+ ```
44
+ 4.
45
+ 5. The converted GGUF model with FP16 precision will then be used to do further quantization to 8 Bit, 5 Bit (K_M) and 4 Bit (K_M).
46
 
47
+ ```
48
+ # 2. Convert original model to GGUF format with FP16 precision
49
+ python3 llama.cpp/convert.py ./original-models/Llama-2-13b-chat-german --outtype f16 --outfile ./converted_gguf/Llama-2-13b-chat-german-GGUF.fp16.bin
50
+ # 3. Quantize GGUF (FP16) to 8, 5 (K_M) and 4 (K_M) bit
51
  ./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q8_0
52
  ./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q5_K_M
53
  ./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q4_K_M