NeoChen1024 commited on
Commit
1717a60
1 Parent(s): 9621d61

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -5
README.md CHANGED
@@ -3,8 +3,7 @@ base_model:
3
  - CausalLM/35b-beta-long
4
  ---
5
  # GGUF quants of CausalLM/35b-beta-long, here I have:
6
- IQ4_XS (fits into 24GiB VRAM + 8192 context with q4_1 KV cache, also room for 2048 ubatch)
7
- IQ4_NL (fits into 24GiB VRAM + 8192 context with q4_1 KV cache)
8
- Q4_K_M (fits into 24GiB VRAM + 6144 context with q4_1 KV cache, also good for CPU inference on E5-26xx v3/v4)
9
- Q8_0 (probably isn't practical for anything unless you have big GPU array, imatrix derived from it)
10
- BF16 (IDK if there's any use of it)
 
3
  - CausalLM/35b-beta-long
4
  ---
5
  # GGUF quants of CausalLM/35b-beta-long, here I have:
6
+ * IQ4_XS (fits into 24GiB VRAM + 8192 context with q4_1 KV cache, also room for 2048 ubatch)
7
+ * IQ4_NL (fits into 24GiB VRAM + 8192 context with q4_1 KV cache)
8
+ * Q4_K_M (fits into 24GiB VRAM + 6144 context with q4_1 KV cache, also good for CPU inference on E5-26xx v3/v4)
9
+ * Q8_0 (probably isn't practical for anything unless you have big GPU array, imatrix derived from it)