Update README.md
Browse files
README.md
CHANGED
@@ -5,15 +5,11 @@ license: llama2
|
|
5 |
This is a quantized version of h2oai/h2ogpt-4096-llama2-13b-chat, formatted in GGUF format to be run with llama.cpp and similar inference tools.
|
6 |
|
7 |
## Available Formats
|
8 |
-
|
9 |
-
### GGUF
|
10 |
-
|
11 |
| Format | Bits | Use case |
|
12 |
| ---- | ---- | ----- |
|
13 |
| q8_0 | 8 | Original quant method, 8-bit. |
|
14 |
|
15 |
### Currently in conversion
|
16 |
-
|
17 |
| Format | Bits | Use case |
|
18 |
| ---- | ---- | ----- |
|
19 |
| q3_K_L | 3 | New k-quant method. Uses GGML_TYPE_Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K |
|
@@ -30,7 +26,6 @@ This is a quantized version of h2oai/h2ogpt-4096-llama2-13b-chat, formatted in G
|
|
30 |
| q6_K | 6 | New k-quant method. Uses GGML_TYPE_Q8_K for all tensors - 6-bit quantization |
|
31 |
|
32 |
# Original Model Card
|
33 |
-
|
34 |
---
|
35 |
inference: false
|
36 |
language:
|
|
|
5 |
This is a quantized version of h2oai/h2ogpt-4096-llama2-13b-chat, formatted in GGUF format to be run with llama.cpp and similar inference tools.
|
6 |
|
7 |
## Available Formats
|
|
|
|
|
|
|
8 |
| Format | Bits | Use case |
|
9 |
| ---- | ---- | ----- |
|
10 |
| q8_0 | 8 | Original quant method, 8-bit. |
|
11 |
|
12 |
### Currently in conversion
|
|
|
13 |
| Format | Bits | Use case |
|
14 |
| ---- | ---- | ----- |
|
15 |
| q3_K_L | 3 | New k-quant method. Uses GGML_TYPE_Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K |
|
|
|
26 |
| q6_K | 6 | New k-quant method. Uses GGML_TYPE_Q8_K for all tensors - 6-bit quantization |
|
27 |
|
28 |
# Original Model Card
|
|
|
29 |
---
|
30 |
inference: false
|
31 |
language:
|