Update README.md
Browse files
README.md
CHANGED
@@ -14,26 +14,22 @@ datasets:
|
|
14 |
- Christoph911/German-legal-SQuAD
|
15 |
- philschmid/test_german_squad
|
16 |
---
|
17 |
-
# WORK IN PROGRESS
|
18 |
-
The text below is work-in-progress and subject to change quickly!
|
19 |
-
|
20 |
# Introduction
|
21 |
This repository contains the model [jphme/Llama-2-13b-chat-german](https://huggingface.co/jphme/Llama-2-13b-chat-german) in GGUF format.
|
22 |
This model was created by [jphme](https://huggingface.co/jphme). It's a fine-tuned variant of Meta's [Llama2 13b Chat](https://huggingface.co/meta-llama/Llama-2-13b-chat) with a compilation of multiple instruction datasets in German language.
|
23 |
|
24 |
# Model Details
|
25 |
|
26 |
-
### General Information
|
27 |
|Attribute|Details|
|
28 |
|----------------------------|--------------------------------------------------------------------------------------------------------------|
|
29 |
| **Name** | [jphme/Llama-2-13b-chat-german](https://huggingface.co/jphme/Llama-2-13b-chat-german) |
|
30 |
| **Creator** | [jphme](https://huggingface.co/jphme) |
|
31 |
-
| **
|
32 |
-
|
33 |
|
34 |
|
35 |
## Deploy from source
|
36 |
-
1. Clone and install llama.cpp *(
|
37 |
```
|
38 |
# Install llama.cpp by cloning the repo from Github.
|
39 |
# When cloned, then:
|
@@ -41,15 +37,11 @@ cd llama.cpp && make
|
|
41 |
```
|
42 |
2. Use the provided `convert.py` file to convert the original model to GGUF with FP16 precision.
|
43 |
```
|
44 |
-
# This command converts the original model to GGUF format with FP16 precision. Make sure to change the file paths and model names to your desire.
|
45 |
python3 llama.cpp/convert.py ./original-models/Llama-2-13b-chat-german --outtype f16 --outfile ./converted_gguf/Llama-2-13b-chat-german-GGUF.fp16.bin
|
46 |
```
|
47 |
3. The converted GGUF model with FP16 precision will then be used to do further quantization to 8 Bit, 5 Bit (K_M) and 4 Bit (K_M).
|
48 |
-
|
49 |
```
|
50 |
-
#
|
51 |
-
python3 llama.cpp/convert.py ./original-models/Llama-2-13b-chat-german --outtype f16 --outfile ./converted_gguf/Llama-2-13b-chat-german-GGUF.fp16.bin
|
52 |
-
# 3. Quantize GGUF (FP16) to 8, 5 (K_M) and 4 (K_M) bit
|
53 |
./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q8_0
|
54 |
./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q5_K_M
|
55 |
./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q4_K_M
|
|
|
14 |
- Christoph911/German-legal-SQuAD
|
15 |
- philschmid/test_german_squad
|
16 |
---
|
|
|
|
|
|
|
17 |
# Introduction
|
18 |
This repository contains the model [jphme/Llama-2-13b-chat-german](https://huggingface.co/jphme/Llama-2-13b-chat-german) in GGUF format.
|
19 |
This model was created by [jphme](https://huggingface.co/jphme). It's a fine-tuned variant of Meta's [Llama2 13b Chat](https://huggingface.co/meta-llama/Llama-2-13b-chat) with a compilation of multiple instruction datasets in German language.
|
20 |
|
21 |
# Model Details
|
22 |
|
|
|
23 |
|Attribute|Details|
|
24 |
|----------------------------|--------------------------------------------------------------------------------------------------------------|
|
25 |
| **Name** | [jphme/Llama-2-13b-chat-german](https://huggingface.co/jphme/Llama-2-13b-chat-german) |
|
26 |
| **Creator** | [jphme](https://huggingface.co/jphme) |
|
27 |
+
| **File Format** | GGUF |
|
28 |
+
| **Quantization Formats** | 8 Bit, 5 Bit (K_M) |
|
29 |
|
30 |
|
31 |
## Deploy from source
|
32 |
+
1. Clone and install llama.cpp *(at time of writing, we used commit 9e20231)*.
|
33 |
```
|
34 |
# Install llama.cpp by cloning the repo from Github.
|
35 |
# When cloned, then:
|
|
|
37 |
```
|
38 |
2. Use the provided `convert.py` file to convert the original model to GGUF with FP16 precision.
|
39 |
```
|
|
|
40 |
python3 llama.cpp/convert.py ./original-models/Llama-2-13b-chat-german --outtype f16 --outfile ./converted_gguf/Llama-2-13b-chat-german-GGUF.fp16.bin
|
41 |
```
|
42 |
3. The converted GGUF model with FP16 precision will then be used to do further quantization to 8 Bit, 5 Bit (K_M) and 4 Bit (K_M).
|
|
|
43 |
```
|
44 |
+
# Quantize GGUF (FP16) to 8, 5 (K_M) and 4 (K_M) bit
|
|
|
|
|
45 |
./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q8_0
|
46 |
./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q5_K_M
|
47 |
./llama.cpp/quantize Llama-2-13b-chat-german-GGUF.fp16.bin Llama-2-13b-chat-german-GGUF.q8_0.bin q4_K_M
|