Out of memory issue.

#34

by kxgong - opened Jan 22, 2024

Discussion

kxgong

Jan 22, 2024

Hi, I use the recommended way (from_pretrained(***) ) to load mixtral-8x7B but it says out-of-memory.

I use 8 x A100 GPUs to run this command. What is problem?

Thank you.

ybelkada

Jan 22, 2024

Hi @kxgong
I suggest to load the model in half-precision (torch_dtype=torch.float16) or in 4-bit precision load_in_4bit=True in order to load your model in the most memory efficient manner possible

kxgong

Jan 23, 2024

Hi @kxgong
I suggest to load the model in half-precision (torch_dtype=torch.float16) or in 4-bit precision load_in_4bit=True in order to load your model in the most memory efficient manner possible

Thank you, I am using mixtral-8x7B for training. I wonder whether using 4bit will cause performance drop.

ybelkada

Jan 23, 2024

@kxgong if you use QLoRA you shouldn't expect performance drop with respect to full-finetuning. You can read more about QLoRA here: https://huggingface.co./blog/4bit-transformers-bitsandbytes and get started with resources on how to run QLoRA with this blogpost for example: https://pytorch.org/blog/finetune-llms/

kxgong

Jan 30, 2024

Thanks for your help.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment