NeoChen1024
commited on
Commit
•
1717a60
1
Parent(s):
9621d61
Update README.md
Browse files
README.md
CHANGED
@@ -3,8 +3,7 @@ base_model:
|
|
3 |
- CausalLM/35b-beta-long
|
4 |
---
|
5 |
# GGUF quants of CausalLM/35b-beta-long, here I have:
|
6 |
-
IQ4_XS (fits into 24GiB VRAM + 8192 context with q4_1 KV cache, also room for 2048 ubatch)
|
7 |
-
IQ4_NL (fits into 24GiB VRAM + 8192 context with q4_1 KV cache)
|
8 |
-
Q4_K_M (fits into 24GiB VRAM + 6144 context with q4_1 KV cache, also good for CPU inference on E5-26xx v3/v4)
|
9 |
-
Q8_0 (probably isn't practical for anything unless you have big GPU array, imatrix derived from it)
|
10 |
-
BF16 (IDK if there's any use of it)
|
|
|
3 |
- CausalLM/35b-beta-long
|
4 |
---
|
5 |
# GGUF quants of CausalLM/35b-beta-long, here I have:
|
6 |
+
* IQ4_XS (fits into 24GiB VRAM + 8192 context with q4_1 KV cache, also room for 2048 ubatch)
|
7 |
+
* IQ4_NL (fits into 24GiB VRAM + 8192 context with q4_1 KV cache)
|
8 |
+
* Q4_K_M (fits into 24GiB VRAM + 6144 context with q4_1 KV cache, also good for CPU inference on E5-26xx v3/v4)
|
9 |
+
* Q8_0 (probably isn't practical for anything unless you have big GPU array, imatrix derived from it)
|
|