GPU requirement for simply running the model
Hello good people of Databricks!
I'm a grad student and trying out Dolly v2 for a summarization problem using an AWS EC2 instance. I have a limited budget for AWS so cannot afford to experiment much. Can you please guide me?
- What is the GPU requirement for running the model?
- The input prompts are going to longer (since it's Summarization task). Would a longer input require higher memory?
I have used dolly v1. It's great but slow probably since I'm running it with 16 GB GPU provided by two Tesla M60s.
Thanks,
Abhilash
Please see https://github.com/databrickslabs/dolly
A100, though it can work on A10 in 8-bit.
Yes longer prompts require more memory. I think you really want at least an A10. M60s aren't even really for deep learning, though would work with more memory maybe.
Thank you! Helps a lot.
@dfurman where do you set load_in_8bit? Is that in the config.json? Thanks!
@srowen , I tried to run the model on a workstation last night (8c Ryzen CPU, 32GB RAM and RTX3090 GPU). The model appears to load correctly, but the RAM quickly saturates to 100% with the VRAM consumption idling on 2GB (Windows and background apps). I am using the GPU version of torch and set the CUDA device ID to force use of the GPU, torch also correctly identifies the CUDA device. Is more RAM required either way to first load in the model, prior to model being transferred to the GPU? Or is the model loaded into RAM either way with the inference running solely on the GPU (i.e. time for a RAM upgrade :) )?
We've released some smaller models trained on the same data if you'd like to try them. These are 2.8B and 6.9B parameter respectively, compared to the current model which is 12B parameters.
https://huggingface.co./databricks/dolly-v2-2-8b
https://huggingface.co./databricks/dolly-v2-6-9b
AutoModelForCausalLM.from_pretrained("databricks/dolly-v2-6-9b", device_map="auto", load_in_8bit=True)
@KanonKop I have been able to load and run the 12b model on a g5d.2xlarge instance on AWS which has 32GB RAM and an A10 GPU.
with:tokenizer = AutoTokenizer.from_pretrained("databricks/dolly-v2-12b")
model = AutoModelForCausalLM.from_pretrained("databricks/dolly-v2-12b", device_map='auto', load_in_8bit=True)
When loading in the model it leaked a couple of GB into swap but then dumped the model into the GPU and RAM usage went down to below 10GB
@jacobgoss thanks for the feedback, the reduced parameter model worked correctly, will try to rerun the full 12b model with 8-bit quantization soon.
@jacobgoss
If that's a Linux host, maybe it helps if you turn the swap off with:
$ sudo swapoff -a
I can't run 12b
and 7b
model in Google Cloud GPU instance with T4/7.5G Mem/100 G disk
, and using the image Debian 10 based Deep Learning VM with , M107, Base CUDA 11.3, Deep Learning VM Image with CUDA 11.3 preinstalled
Always fails by MemoryError. So frustrating .
Anyone works as expected?
This is documented in the repo https://github.com/databrickslabs/dolly#training-on-other-instances
A T4 isn't nearly enough, and 7.5GB mem won't work.
You want an A100 for the largest model, and there are notes there for smaller GPUs
What is Google Colab, how can it be used to run these models?
I think the question is answered, no? you seem to be asking something unrelated, too. I'm not sure I understand
Based on @srowen answers the minimum GPU requirement to even run this model is A100, which costs $10k+, so you might not in the future call this model as "runnable in home PC", I bet no-one has $10k GPU in home PC.
Powering many of these applications is a roughly $10,000 chip that’s become one of the most critical tools in the artificial intelligence industry: The Nvidia A100.
@srowen 1. Where is the topic of this discussion answered?
@srowen 2. So you are closing this because I am asking something unrelated? How is asking about possible GPUs in which to run ths is unrelated,when the topic is "GPU requirements to run this model"
@srowen 3. What you didn't understand, something wrong with language or questions?
Also, most of the comments in this discussion are about not being able to run any models, with any GPUs, or comments about failing to even run model, no comments about successful runs and inferings from the model with any of the GPUs tyey are trying, and no successful training runs either.
And you conclude this can be closed as resolved, seems that you couldn't care less whether community is able to actually run and use these models or not.
Hey
@jaklan
, slow your roll there.
I think you're not reading the docs and discussion above. You don't need an A100; you certainly do not buy one to start using this. These are, obviously, available in the cloud. You can run the 12B model on an A10 or V100 or a T4 (16GB) with 8bit. In fact, that's what was discussed above. That's about all this thread is about. That's why I don't know how to answer "where is the answer"
You're asking things like "what is Colab?" which is unrelated, and then re-asking the same question.
It's general best practice anywhere to just start new threads for different questions, if your question isn't already answered.
Why so abusive @srowen ?
If you make claims about other accounts, you are stepping out of scope, and you are constantly insulting, towards me specifically, closed this discussion because I am asking wrong questions, and acting as an admin who doesn't like that community comments or asks questions.
Reporting you for abuse!
(This is actually Hugging Face.) I don't understand your tone or complaint here. This isn't your question that I answered and deemed finished. You added both the same, and a different, question after. Just don't see any other way to read the timeline?
I am an 'admin' for these repos.
"Closing" a discussion is like marking an issue resolved. I don't get why that's perceived as negative.
You are welcome to report whatever you want, but, I think the discussion speaks for itself.
I will not interact with you more on this. I will interact with normal boring civil threads that 99.9% of people manage here.
I think the question is answered, no? you seem to be asking something unrelated, too. I'm not sure I understand
No it isn't.
No, I am not asking something unrelated, I am asking something on topic: what is minimum GPU requirement to run this model, and how?
If you don't understand, why did you answer then, I didn't ask you I asked the forum/community!
Last time: https://huggingface.co./databricks/dolly-v2-12b/discussions/9#643fc2866fd05d823065341b
I do feel it is appropriate to close discussions that have concluded, where further comments aren't adding anything - re-asking what's been answered, "me too", different questions. Of course, anyone is welcome to start a new discussion, hopefully not a duplicate. It keeps the list of active discussions clean, and keeps separate threads separate.
I understand the question, and your question. I don't understand your puzzlement at the above.
(This is actually Hugging Face.) I don't understand your tone or complaint here. This isn't your question that I answered and deemed finished. You added both the same, and a different, question after. Just don't see any other way to read the timeline?
I am an 'admin' for these repos.
"Closing" a discussion is like marking an issue resolved. I don't get why that's perceived as negative.
You are welcome to report whatever you want, but, I think the discussion speaks for itself.
I will not interact with you more on this. I will interact with normal boring civil threads that 99.9% of people manage here.
As you wish 💪 (because of # posts where
@srowen
claims I've done things which I haven't: haven't asked unrelated questions, haven't re-asked them etc, and because of # posts where
@srowen
insults and attacks me personally ‼️
https://huggingface.co./databricks/dolly-v2-12b/discussions/47#6440c9417841867cd5b7a068
try one of these, you can pick any size you want according to your budget:
https://cloud.lambdalabs.com/
https://cloud.coreweave.com/
https://paperspace.com/