File size: 471 Bytes
a222259 1717a60 |
1 2 3 4 5 6 7 8 9 |
---
base_model:
- CausalLM/35b-beta-long
---
# GGUF quants of CausalLM/35b-beta-long, here I have:
* IQ4_XS (fits into 24GiB VRAM + 8192 context with q4_1 KV cache, also room for 2048 ubatch)
* IQ4_NL (fits into 24GiB VRAM + 8192 context with q4_1 KV cache)
* Q4_K_M (fits into 24GiB VRAM + 6144 context with q4_1 KV cache, also good for CPU inference on E5-26xx v3/v4)
* Q8_0 (probably isn't practical for anything unless you have big GPU array, imatrix derived from it) |