35 minutes to generate an image

#35
by gimmi9999 - opened

Hei guys, i'm facing the problem that it takes 35 minutes to generate an image.
is it normal?
the comuper i'm using is a really powerfull computer

that sounds like the pipe is going to CPU, not GPU. Are you sure you're using GPU?

Before I got my NVIDIA RTX GPU I ran some tests on CPU only and it did take 30+ minutes.

are you running it in python?

from huggingface_hub import login
from diffusers import StableDiffusion3Pipeline
import torch

Login a Hugging Face

login(token="--------------------------------")

pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3.5-large", torch_dtype=torch.bfloat16)
pipe = pipe.to("cuda")

image = pipe(
"A capybara holding a sign that reads Hello World",
num_inference_steps=28,
guidance_scale=3.5,
).images[0]
image.save("capybara.png")

this is the code. but still takes 35 minutes

@gimmi9999 - yea my environment is a miniconda/python setup and I invoke with diffusers. When I switched from CPU to "cuda" - even with my basic RTX 4060, I can generate an image from sd35 large in about 5 min (using quant since I have only 8GB VRAM).

I used the example code (diffusers) from the model card page with no modifications and all ran well. Good luck!

Now if I can just figure out why my set up it won't run "disconnected" -- but I have my own thread on that topic! :)

it's possible that your cuda enabled card is not being detected. What specific card are you using?

run this code and what output do you get?

import torch

print("PyTorch version:", torch.version)
print("CUDA available:", torch.cuda.is_available())
print("CUDA version:", torch.version.cuda)
print("GPU device:", torch.cuda.get_device_name(0))

For example on my system:

(t2i) PS C:\vscode-projects> python .\chk-cuda-version.py
PyTorch version: 2.5.1
CUDA available: True
CUDA version: 11.8
GPU device: NVIDIA GeForce RTX 4060

PyTorch version: <module 'torch.version' from 'C:\Users\User\PycharmProjects\pythonProject4\.venv\Lib\site-packages\torch\version.py'>
CUDA available: True
CUDA version: 11.8
GPU device: NVIDIA GeForce RTX 4090

wow. 4090. I wish!

I'm outta suggestions - I wish I could help.

I can offer once final suggestion - while getting everything set up on my system, I started with a much simpler model - so I could test code changes with less waiting time... Perhaps start with something simple, verify it's using the cuda pipe and then work up?

I started with this:

from diffusers import StableDiffusionPipeline
import torch

Load the pipeline with torch_dtype=torch.float16 for GPU usage

pipe = StableDiffusionPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
torch_dtype=torch.float16
)

Move the pipeline to GPU

pipe = pipe.to("cuda")

Set the resolution

height = 1024
width = 512

Generate an image

prompt = "a black rose in a yellow vase"
image = pipe(prompt, height=height, width=width).images[0]
image.show() # Display the image

@gimmi9999
Can you try removing pipe.to("cuda") and replace it with pipe.enable_model_cpu_offload()

I think the issue is that some of your vram is occupied already. The model is probably not fitting on your gpu most likely using shared vram which will massively slow down inference. The above code should lower vram usage and not use shared vram then.

Worked!!!
thank you!
Did any of you tried to make a fine tunning with a few immages?
or a small database like 10 photos?

Sign up or log in to comment