NeoChen1024
/

CausalLM_35b-beta-long-GGUF-imatrix

Inference Endpoints

Model card Files Files and versions Community

NeoChen1024 commited on Sep 18

Commit

1717a60

•

1 Parent(s): 9621d61

Update README.md

Files changed (1) hide show

README.md +4 -5

README.md CHANGED Viewed

@@ -3,8 +3,7 @@ base_model:
 - CausalLM/35b-beta-long
 ---
 # GGUF quants of CausalLM/35b-beta-long, here I have:
-IQ4_XS (fits into 24GiB VRAM + 8192 context with q4_1 KV cache, also room for 2048 ubatch)
-IQ4_NL (fits into 24GiB VRAM + 8192 context with q4_1 KV cache)
-Q4_K_M (fits into 24GiB VRAM + 6144 context with q4_1 KV cache, also good for CPU inference on E5-26xx v3/v4)
-Q8_0 (probably isn't practical for anything unless you have big GPU array, imatrix derived from it)
-BF16 (IDK if there's any use of it)

 - CausalLM/35b-beta-long
 ---
 # GGUF quants of CausalLM/35b-beta-long, here I have:
+* IQ4_XS (fits into 24GiB VRAM + 8192 context with q4_1 KV cache, also room for 2048 ubatch)
+* IQ4_NL (fits into 24GiB VRAM + 8192 context with q4_1 KV cache)
+* Q4_K_M (fits into 24GiB VRAM + 6144 context with q4_1 KV cache, also good for CPU inference on E5-26xx v3/v4)
+* Q8_0 (probably isn't practical for anything unless you have big GPU array, imatrix derived from it)