Spaces:
Running
on
T4
Apply for community grant: Academic project (gpu)
Hi,
The model fails to load into the GPU with an Nvidia 10G small and other GPUs with smaller RAM + VRAM. It would be great if we could get a Nvidia 10G large or a larger GPU.
Disclaimer: This project is not mine, the credit for this amazing work goes to @liuhaotian et al, I just integrated their gradio demo into this huggingface space
This is a brilliant demo
@badayvedat
!
We wanted to let you know that we've assigned a GPU to your space, and your GPU grant application has been approved. Congratulations! Please keep in mind that GPU grants are provided on a temporary basis and may be removed if usage is very low.
To learn more about GPUs in Spaces, please check out https://huggingface.co./docs/hub/spaces-gpus. We look forward to seeing the innovative work you produce with this grant. If you have any questions or concerns, please let us know. Thank you for your interest in our platform!
Hi
@badayvedat
This Space is currently running the 7B model on A10G large. But can you update your code to load the model in 4bit (or 8bit)?
I've tested only on my AWS environment with T4, but it's possible to run the 7B model on T4 if we load it in 8bit, and even the 13B model can run on T4 if loaded in 4bit. This is mentioned in the README.md of the original repo.
Also, regarding the CPU RAM, when I load the 7B model in 8bit or the 13B model in 4bit, the maximum CPU RAM usage seems to be less than 15GB in both cases. I think you can pass this kwargs to LlavaLlamaForCausalLM.from_pretrained
here.
hey
@ysharma
@hysts
!
with the help of the
@liuhaotian
we've decreased the memory requirements for the space using 4 bits inference and removing the model preload, and now it works in a T4-Small machine ๐
Cool, probably we can now change it to "T4-Small (16G)" in the instruction section of app.py
?
yes! updating it right now.
Awesome! Thanks for the updates!!
Hello, how to add LLaVA on Bubble.io or on another application builder ?