freefallr commited on
Commit
6e51123
1 Parent(s): 497ed8b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -12
README.md CHANGED
@@ -14,26 +14,22 @@ datasets:
14
  - Christoph911/German-legal-SQuAD
15
  - philschmid/test_german_squad
16
  ---
17
- # WORK IN PROGRESS
18
- The text below is work-in-progress and subject to change quickly!
19
-
20
  # Introduction
21
  This repository contains the model [jphme/Llama-2-13b-chat-german](https://huggingface.co/jphme/Llama-2-13b-chat-german) in GGUF format.
22
  This model was created by [jphme](https://huggingface.co/jphme). It's a fine-tuned variant of Meta's [Llama2 13b Chat](https://huggingface.co/meta-llama/Llama-2-13b-chat) with a compilation of multiple instruction datasets in German language.
23
 
24
  # Model Details
25
 
26
- ### General Information
27
  |Attribute|Details|
28
  |----------------------------|--------------------------------------------------------------------------------------------------------------|
29
  | **Name** | [jphme/Llama-2-13b-chat-german](https://huggingface.co/jphme/Llama-2-13b-chat-german) |
30
  | **Creator** | [jphme](https://huggingface.co/jphme) |
31
- | **Source** | https://huggingface.co/ |
32
-
33
 
34
 
35
  ## Deploy from source
36
- 1. Clone and install llama.cpp *(Commit: 9e20231)*.
37
  ```
38
  # Install llama.cpp by cloning the repo from Github.
39
  # When cloned, then:
@@ -41,15 +37,11 @@ cd llama.cpp && make
41
  ```
42
  2. Use the provided `convert.py` file to convert the original model to GGUF with FP16 precision.
43
  ```
44
- # This command converts the original model to GGUF format with FP16 precision. Make sure to change the file paths and model names to your desire.
45
  python3 llama.cpp/convert.py ./original-models/Llama-2-13b-chat-german --outtype f16 --outfile ./converted_gguf/Llama-2-13b-chat-german-GGUF.fp16.bin
46
  ```
47
  3. The converted GGUF model with FP16 precision will then be used to do further quantization to 8 Bit, 5 Bit (K_M) and 4 Bit (K_M).
48
-
49
  ```
50
- # 2. Convert original model to GGUF format with FP16 precision
51
- python3 llama.cpp/convert.py ./original-models/Llama-2-13b-chat-german --outtype f16 --outfile ./converted_gguf/Llama-2-13b-chat-german-GGUF.fp16.bin
52
- # 3. Quantize GGUF (FP16) to 8, 5 (K_M) and 4 (K_M) bit
53
  ./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q8_0
54
  ./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q5_K_M
55
  ./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q4_K_M
 
14
  - Christoph911/German-legal-SQuAD
15
  - philschmid/test_german_squad
16
  ---
 
 
 
17
  # Introduction
18
  This repository contains the model [jphme/Llama-2-13b-chat-german](https://huggingface.co/jphme/Llama-2-13b-chat-german) in GGUF format.
19
  This model was created by [jphme](https://huggingface.co/jphme). It's a fine-tuned variant of Meta's [Llama2 13b Chat](https://huggingface.co/meta-llama/Llama-2-13b-chat) with a compilation of multiple instruction datasets in German language.
20
 
21
  # Model Details
22
 
 
23
  |Attribute|Details|
24
  |----------------------------|--------------------------------------------------------------------------------------------------------------|
25
  | **Name** | [jphme/Llama-2-13b-chat-german](https://huggingface.co/jphme/Llama-2-13b-chat-german) |
26
  | **Creator** | [jphme](https://huggingface.co/jphme) |
27
+ | **File Format** | GGUF |
28
+ | **Quantization Formats** | 8 Bit, 5 Bit (K_M) |
29
 
30
 
31
  ## Deploy from source
32
+ 1. Clone and install llama.cpp *(at time of writing, we used commit 9e20231)*.
33
  ```
34
  # Install llama.cpp by cloning the repo from Github.
35
  # When cloned, then:
 
37
  ```
38
  2. Use the provided `convert.py` file to convert the original model to GGUF with FP16 precision.
39
  ```
 
40
  python3 llama.cpp/convert.py ./original-models/Llama-2-13b-chat-german --outtype f16 --outfile ./converted_gguf/Llama-2-13b-chat-german-GGUF.fp16.bin
41
  ```
42
  3. The converted GGUF model with FP16 precision will then be used to do further quantization to 8 Bit, 5 Bit (K_M) and 4 Bit (K_M).
 
43
  ```
44
+ # Quantize GGUF (FP16) to 8, 5 (K_M) and 4 (K_M) bit
 
 
45
  ./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q8_0
46
  ./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q5_K_M
47
  ./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q4_K_M