freefallr commited on
Commit
e3ab8bc
1 Parent(s): d8abe61

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -5
README.md CHANGED
@@ -5,15 +5,11 @@ license: llama2
5
  This is a quantized version of h2oai/h2ogpt-4096-llama2-13b-chat, formatted in GGUF format to be run with llama.cpp and similar inference tools.
6
 
7
  ## Available Formats
8
-
9
- ### GGUF
10
-
11
  | Format | Bits | Use case |
12
  | ---- | ---- | ----- |
13
  | q8_0 | 8 | Original quant method, 8-bit. |
14
 
15
  ### Currently in conversion
16
-
17
  | Format | Bits | Use case |
18
  | ---- | ---- | ----- |
19
  | q3_K_L | 3 | New k-quant method. Uses GGML_TYPE_Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K |
@@ -30,7 +26,6 @@ This is a quantized version of h2oai/h2ogpt-4096-llama2-13b-chat, formatted in G
30
  | q6_K | 6 | New k-quant method. Uses GGML_TYPE_Q8_K for all tensors - 6-bit quantization |
31
 
32
  # Original Model Card
33
-
34
  ---
35
  inference: false
36
  language:
 
5
  This is a quantized version of h2oai/h2ogpt-4096-llama2-13b-chat, formatted in GGUF format to be run with llama.cpp and similar inference tools.
6
 
7
  ## Available Formats
 
 
 
8
  | Format | Bits | Use case |
9
  | ---- | ---- | ----- |
10
  | q8_0 | 8 | Original quant method, 8-bit. |
11
 
12
  ### Currently in conversion
 
13
  | Format | Bits | Use case |
14
  | ---- | ---- | ----- |
15
  | q3_K_L | 3 | New k-quant method. Uses GGML_TYPE_Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K |
 
26
  | q6_K | 6 | New k-quant method. Uses GGML_TYPE_Q8_K for all tensors - 6-bit quantization |
27
 
28
  # Original Model Card
 
29
  ---
30
  inference: false
31
  language: