What is the VRAM requirement of this model?
What is the VRAM requirement of this model? I have 8 GB VRAM and I was wondering if this model could be run on that much?
If you have bitsandbytes
, you should be able to load the model with load_in_8bit=True
param in your AutoModelForCausalLM
func
I don't think VRAM 8GB is enough for this unfortunately (especially given that when we go to 32K, the size of KV cache becomes quite large too) -- we are pushing to decrease this! (e.g., we could do some KV cache quantization similar to what we have done in https://arxiv.org/abs/2303.06865, but it will take time)
In the meantime, you can go to https://api.together.xyz/playground to play with it!
Ce
How can we load the model using bitsandbytes ?
@BajrangWappnet , I think you can just do something like this:
model = AutoModelForCausalLM.from_pretrained(
"togethercomputer/LLaMA-2-7B-32K",
trust_remote_code=False,
torch_dtype=torch.float16,
load_in_8bit=True
)
Here's a more detailed example on how to use bitsandbytes
: https://github.com/TimDettmers/bitsandbytes/blob/main/examples/int8_inference_huggingface.py