35 minutes to generate an image

#35

by gimmi9999 - opened 3 days ago

3 days ago

Hei guys, i'm facing the problem that it takes 35 minutes to generate an image.
is it normal?
the comuper i'm using is a really powerfull computer

rustyw007

3 days ago

that sounds like the pipe is going to CPU, not GPU. Are you sure you're using GPU?

Before I got my NVIDIA RTX GPU I ran some tests on CPU only and it did take 30+ minutes.

gimmi9999

3 days ago

are you running it in python?

gimmi9999

3 days ago

from huggingface_hub import login
from diffusers import StableDiffusion3Pipeline
import torch

Login a Hugging Face

pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3.5-large", torch_dtype=torch.bfloat16)
pipe = pipe.to("cuda")

image = pipe(
"A capybara holding a sign that reads Hello World",
num_inference_steps=28,
guidance_scale=3.5,
).images[0]
image.save("capybara.png")

this is the code. but still takes 35 minutes

rustyw007

3 days ago

@gimmi9999 - yea my environment is a miniconda/python setup and I invoke with diffusers. When I switched from CPU to "cuda" - even with my basic RTX 4060, I can generate an image from sd35 large in about 5 min (using quant since I have only 8GB VRAM).

I used the example code (diffusers) from the model card page with no modifications and all ran well. Good luck!

Now if I can just figure out why my set up it won't run "disconnected" -- but I have my own thread on that topic! :)

rustyw007

3 days ago

it's possible that your cuda enabled card is not being detected. What specific card are you using?

run this code and what output do you get?

import torch

print("PyTorch version:", torch.version)
print("CUDA available:", torch.cuda.is_available())
print("CUDA version:", torch.version.cuda)
print("GPU device:", torch.cuda.get_device_name(0))

For example on my system:

(t2i) PS C:\vscode-projects> python .\chk-cuda-version.py
PyTorch version: 2.5.1
CUDA available: True
CUDA version: 11.8
GPU device: NVIDIA GeForce RTX 4060

gimmi9999

3 days ago

•

edited 3 days ago

PyTorch version: <module 'torch.version' from 'C:\Users\User\PycharmProjects\pythonProject4\.venv\Lib\site-packages\torch\version.py'>
CUDA available: True
CUDA version: 11.8
GPU device: NVIDIA GeForce RTX 4090

rustyw007

3 days ago

wow. 4090. I wish!

I'm outta suggestions - I wish I could help.

I can offer once final suggestion - while getting everything set up on my system, I started with a much simpler model - so I could test code changes with less waiting time... Perhaps start with something simple, verify it's using the cuda pipe and then work up?

I started with this:

from diffusers import StableDiffusionPipeline
import torch

Load the pipeline with `torch_dtype=torch.float16` for GPU usage

pipe = StableDiffusionPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
torch_dtype=torch.float16
)

Move the pipeline to GPU

pipe = pipe.to("cuda")

Set the resolution

height = 1024
width = 512

Generate an image

prompt = "a black rose in a yellow vase"
image = pipe(prompt, height=height, width=width).images[0]
image.show() # Display the image

YaTharThShaRma999

2 days ago

@gimmi9999
Can you try removing pipe.to("cuda") and replace it with pipe.enable_model_cpu_offload()

I think the issue is that some of your vram is occupied already. The model is probably not fitting on your gpu most likely using shared vram which will massively slow down inference. The above code should lower vram usage and not use shared vram then.

gimmi9999

1 day ago

Worked!!!
thank you!
Did any of you tried to make a fine tunning with a few immages?
or a small database like 10 photos?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

35 minutes to generate an image

Login a Hugging Face

Load the pipeline with torch_dtype=torch.float16 for GPU usage

Move the pipeline to GPU

Set the resolution

Generate an image

Load the pipeline with `torch_dtype=torch.float16` for GPU usage