NeoChen1024
/

CausalLM_35b-beta-long-GGUF-imatrix

Inference Endpoints

Model card Files Files and versions Community

NeoChen1024 commited on Sep 18

Commit

a222259

•

1 Parent(s): 9bf8181

Create README.md

Files changed (1) hide show

README.md +10 -0

README.md ADDED Viewed

	@@ -0,0 +1,10 @@

+---
+base_model:
+- CausalLM/35b-beta-long
+---
+# GGUF quants of CausalLM/35b-beta-long, here I have:
+IQ4_XS (fits into 24GiB VRAM + 8192 context with q4_1 KV cache, also room for 2048 ubatch)
+IQ4_NL (fits into 24GiB VRAM + 8192 context with q4_1 KV cache)
+Q4_K_M (fits into 24GiB VRAM + 6144 context with q4_1 KV cache, also good for CPU inference on E5-26xx v3/v4)
+Q8_0 (probably isn't practical for anything unless you have big GPU array, imatrix derived from it)
+BF16 (IDK if there's any use of it)