low memory usage

#10
by Knut-J - opened

Is there any way to use low memory, as my GPU only has 24 GB. I get this error message torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 462.00 MiB. GPU

@Knut-J 24GB isn't gonna cut it for this beast of a model. NVLM-D 72B is huge. But don't give up yet! Try these tricks:

  • CPU offloading: Use device_map="auto" when loading the model. It'll be slow as molasses, but it might just work.
  • 8-bit quantization: Add load_in_8bit=True to your model loading. It'll sacrifice some quality, but hey, beggars can't be choosers.
    Last resort: Downgrade to a smaller model. Sometimes you gotta know when to fold 'em.

Fair warning: These hacks might make your inference slower than a snail on tranquilizers. But if you're dead set on using this model, it's worth a shot. Good luck!

Anybody use an external hard drive to run this?

Hi,

I have tried to run the inference code given here on AWS p3dn.24xlarge and p4de.24xlarge facing an space error.
But facing issues
OSError: [Errno 28] No space left on device

image.png

specs of the instance

image.png

Have tried the following the tips given here https://discuss.huggingface.co/t/no-space-left-on-device-when-downloading-a-large-model-for-the-sagemaker-training-job/43643

Any help is appreciated, please let me know if I am missing something
Thanks in advance

NVIDIA org

Hi Malini,

Thank you for your interests in our model.

From the first screenshot, it shows that your home directory has not enough disk space. Running NVLM-D requires at least around 200GB of disk space.

Your second screenshot suggests that it is likely that you have separate disks.

Please try run the following command on the disk with 1000GB.

  1. Install Git Large File Storage (LFS) by running:

git lfs install

  1. Clone the NVLM-D repository using:

git clone https://huggingface.co./nvidia/NVLM-D-72B

(say this clones the model into your local path "path/to/NVLM-D-72B")

  1. Load the model with your local path
path = "path/to/NVLM-D-72B"
device_map = split_model()
model = AutoModel.from_pretrained(
    path,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    use_flash_attn=False,
    trust_remote_code=True,
    device_map=device_map).eval()

Let me know if you encounter any issues, and I’d be happy to assist further!

Best,
Boxin

Update!
Thank you @boxin-wbx ! Issue was fixed when i installed the LFS and cloned the repo.
I did need to run the device_map() to run the inference on text.
When I run the inference on images, am getting a memory error.Sharing the screenshot below

image.png

NVIDIA org

We haven't tested on V100 before. But a node with 2 H100 / A100 GPUs (each with 80GB of memory) should work.

Thanks @boxin-wbx .It worked on a ml.p4de.24xlarge instance. Appreciate your inputs.

Sign up or log in to comment