Inscrutable Issues w/ Torch and Zero-GPU

#93
by WillHeld - opened

Hi Zero-GPU team! @akhaliq reached out to me on Twitter to see if I would like to port our demo at diva-audio.github.io from GCP to Zero-GPU for more stable, long-term, and efficient hosting. I changed the demo code following the docs and errors that I could debug in the code here, but have now hit a few snags that I can't figure out the root cause of in the zero GPU internals out so AK suggested I post here and check with @hysts to see if they can provide any clues on what to try.

Issue 1: torch has no attribute obj

Screenshot 2024-07-31 at 11.33.20 PM.png

This issue appeared from the start once I switched to the Zero-GPU hosting and crashes my app before it launches. I'm able to seemingly resolve it temporarily by going into Dev-Mode, and removing obj from the lGENERIC_METHOD_NAMES list, but I'm not quite sure of what the implications of that are and it's totally possible that's what causes issue number 2. I initially thought this might be a torch versioning issue, but even when freezing my transformers version to one I found in another working zero-gpu space, I still hit this.

I'm sure it's a setup error on my end, but I can't quite figure out what that error is from the current docs or from looking at other working spaces!

Issue 2: CUDACachingAllocator

If I go into dev mode and remove the obj, the app launches and renders but then hits what seems to be the same issue as here: https://huggingface.co./spaces/zero-gpu-explorers/README/discussions/21

I've double checked all my to("cuda")calls are happening outside of the @zero.gpu contexts, so the suggested solution there doesn't seem to be the issue in my case. My best guess is that there's something strange happening since I'm using a yield rather than a return? But again, not quite sure.

It's also possible this is just a sharp edge since all the models in the demo (my own DiVA, SALMONN, and Qwen) all use custom code rather than directly HF supported models.

Final possibility is that I need to drop the VRAM on my demo? Currently, the version we are hosting on GCP uses two 40 GB A100s to run SALMONN, Qwen-Audio, and DiVA simultaneously. I've tried to split up the @zero.gpu contexts so that it's clear only one model needs to be active at a time and the others can be offloaded to CPU memory or disk, but not sure if that's how that works.

Zero-GPU (in a good way) seems like magic! Unfortunately, since I don't yet understand what makes the magic work from the outside I am a bit stuck and help would be very much appreciated!!

ZeroGPU Explorers org

Hi @WillHeld
As for the issue 1, could you try setting an environment variable ZEROGPU_V2 to true in your Space settings? I think the error is caused by a bug of spaces package, which is currently being refactored to introduce the new V2.

(I'm not sure about the issue 2, but maybe it can be resolved if the issue 1 is resolved?)

On it! Done

Screenshot 2024-08-01 at 12.07.58 AM.png

Ok - thanks @hysts !

The V2 ZEROGPU fixed the first error. Unfortunately, now the space gets killed as it boots up.

Screenshot_20240801-080726.png

This seems more likely to be a "too much VRAM required issue". If that sounds right, I can just deactivate one of the models for the Zero-GPU version of the demo so that everything runs on a single 40GB A100?

ZeroGPU Explorers org

@WillHeld Thanks for checking!
Oh, so, the model size is 68.3GB? That's huge. My understanding is that models are offloaded to disk in V2, so maybe the disk is running out. @cbensimon
Anyway, I think it won't work with V1 either even if the bug was fixed because models are offload to CPU in V1, but the CPU RAM size of ZeroGPU Space is 64GB.
So, yeah, I think you need to deactivate some of your models to run your Space on ZeroGPU.

ZeroGPU Explorers org

I too find it inscrutable that ZeroGPU flat out refuses to give us 68GB of free GPU. We are starving over here and ya'll are in your Nvidia ivory towers refusing to let me use a measly H100 for free.....

Each model itself is around 23, but we're hosting a model arena style interface so it comes out to the 70 gig total!

Probably long term, the design pattern would be to have each individual model hosted separately via ZERO and then a unified space that's calling the others via API.

I think in the short term disabling SALMONN so that it can run on Spaces is probably the path forward on my end. Thanks!

I too find it inscrutable that ZeroGPU flat out refuses to give us 68GB of free GPU. We are starving over here and ya'll are in your Nvidia ivory towers refusing to let me use a measly H100 for free.....

Certainly not begrudging the limits! Just trying to understand them :) - after all we're paying ourselves to host these 3 models rather than just our own so that the community can try them and will happily continue to do so as long as our project budget allows.

I was contacted by a HF employee asking why I hadn't hosted the service on spaces and am in the process of seeing what version of the demo is possible on Spaces :)

ZeroGPU Explorers org

@WillHeld I just thought it was a good opportunity to be funny lol not trying to give you a hard time over it!

Sign up or log in to comment