Which options to clone the space?

What's the cheapest GPU for cloning the space? I tried some, but it needed too much vRAM. But at the same time some people here claim it runs on 24GB. Am I understanding correctly that it needs more than 50GB?

It runs fine on 24GB. I'm not sure how it's loaded when you clone the space, but running it locally with 24gb is fine

Okay, I'll try again and post the errors here if it doesn't work.

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 54.00 MiB. GPU 0 has a total capacity of 21.96 GiB of which 47.06 MiB is free. Including non-PyTorch memory, this process has 0 bytes memory in use. Of the allocated memory 21.72 GiB is allocated by PyTorch, and 9.69 MiB is reserved by PyTorch but unallocated.
During inference, it uses up a bit more VRAM after you load the model initially. Try adding this line "pipeline.enable_model_cpu_offload()" right after you initialize your pipeline. Any extra memory it takes up for inference will be sent to the CPU. You can also try setting the 'device_map = "balanced"' though, in my experience, that didn't help.

You might want to update your transformers library from 4.22.0 to the new 4.44.0 due to this being a more recent model.

Here is the code to quantize the model to fp8. It'll run on 16GB of VRAM without quality loss (will slowly use up ~50GB of RAM, then it'll send it to the GPU, using up just 16GB VRAM):

from optimum.quanto import freeze, qfloat8, quantize
from diffusers import FlowMatchEulerDiscreteScheduler, AutoencoderKL
from diffusers.models.transformers.transformer_flux import FluxTransformer2DModel
from diffusers.pipelines.flux.pipeline_flux import FluxPipeline
from transformers import CLIPTextModel, CLIPTokenizer,T5EncoderModel, T5TokenizerFast

dtype = torch.bfloat16

bfl_repo = "black-forest-labs/FLUX.1-dev"

scheduler = FlowMatchEulerDiscreteScheduler.from_pretrained(bfl_repo, subfolder="scheduler")
text_encoder = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14")
tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14")
text_encoder_2 = T5EncoderModel.from_pretrained(bfl_repo, subfolder="text_encoder_2", torch_dtype=dtype)
tokenizer_2 = T5TokenizerFast.from_pretrained(bfl_repo, subfolder="tokenizer_2", torch_dtype=dtype)
vae = AutoencoderKL.from_pretrained(bfl_repo, subfolder="vae", torch_dtype=dtype)
transformer = FluxTransformer2DModel.from_pretrained(bfl_repo, subfolder="transformer", torch_dtype=dtype)

quantize(transformer, weights=qfloat8)

quantize(text_encoder_2, weights=qfloat8)

pipeline = FluxPipeline(

Not sure if I can use that on HF, but I'll look into it. Right now my free GPU availability here got much better.

Didn't work and I wrap it up. Maybe I will clone a space of someone makes some, but currently it's even not necessary. There's enough GPU available.

