NeoChen1024
/

CausalLM_35b-beta-long-GGUF-imatrix

Inference Endpoints

Model card Files Files and versions Community

CausalLM_35b-beta-long-GGUF-imatrix / README.md

NeoChen1024's picture

Update README.md

1717a60 verified about 2 months ago

|

471 Bytes

	---
	base_model:
	- CausalLM/35b-beta-long
	---
	# GGUF quants of CausalLM/35b-beta-long, here I have:
	* IQ4_XS (fits into 24GiB VRAM + 8192 context with q4_1 KV cache, also room for 2048 ubatch)
	* IQ4_NL (fits into 24GiB VRAM + 8192 context with q4_1 KV cache)
	* Q4_K_M (fits into 24GiB VRAM + 6144 context with q4_1 KV cache, also good for CPU inference on E5-26xx v3/v4)
	* Q8_0 (probably isn't practical for anything unless you have big GPU array, imatrix derived from it)