I don't know why it won't fit into RTX 3090
#1
by
DrNicefellow
- opened
Because the vicuna-34B-GPTQ can fit into the card with exllama loader. This one reqruies more than 40GB GRAM. Much more emergent features?
You should limit the ctx len to shrink the pre-allocated kv cache used by exllama. The originial 4K ctx len Yi-34B 4bit gptq model could fit in 21GB vram
Don't worry, it doesn't fit even in 48 VRAM. :D