low memory usage
#10
by
Knut-J
- opened
Is there any way to use low memory, as my GPU only has 24 GB. I get this error message torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 462.00 MiB. GPU
@Knut-J 24GB isn't gonna cut it for this beast of a model. NVLM-D 72B is huge. But don't give up yet! Try these tricks:
- CPU offloading: Use device_map="auto" when loading the model. It'll be slow as molasses, but it might just work.
- 8-bit quantization: Add load_in_8bit=True to your model loading. It'll sacrifice some quality, but hey, beggars can't be choosers.
Last resort: Downgrade to a smaller model. Sometimes you gotta know when to fold 'em.
Fair warning: These hacks might make your inference slower than a snail on tranquilizers. But if you're dead set on using this model, it's worth a shot. Good luck!
Anybody use an external hard drive to run this?