Flux Edit

Prompt
Give this the look of a traditional Japanese woodblock print.

Prompt
transform the setting to a winter scene

Prompt
turn the color of mushroom to gray

Prompt
Change it to look like it's in the style of an impasto painting.

These are the control weights trained on black-forest-labs/FLUX.1-dev and TIGER-Lab/OmniEdit-Filtered-1.2M for image editing. We use the Flux Control framework for fine-tuning.

License

Please adhere to the licensing terms as described here

Intended uses & limitations

Inference

from diffusers import FluxControlPipeline, FluxTransformer2DModel
from diffusers.utils import load_image
import torch 

path = "sayakpaul/FLUX.1-dev-edit-v0" 
edit_transformer = FluxTransformer2DModel.from_pretrained(path, torch_dtype=torch.bfloat16)
pipeline = FluxControlPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev", transformer=edit_transformer, torch_dtype=torch.bfloat16
).to("cuda")

url = "https://huggingface.co./datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/assets/mushroom.jpg"
image = load_image(url) # resize as needed.
print(image.size)

prompt = "turn the color of mushroom to gray"
image = pipeline(
    control_image=image,
    prompt=prompt,
    guidance_scale=30., # change this as needed.
    num_inference_steps=50, # change this as needed.
    max_sequence_length=512,
    height=image.height,
    width=image.width,
    generator=torch.manual_seed(0)
).images[0]
image.save("edited_image.png")

Speeding inference with a turbo LoRA

We can speed up the inference by reducing the num_inference_steps to produce a nice image by using turbo LoRA like ByteDance/Hyper-SD.

Make sure to install peft before running the code below: pip install -U peft.

Code

from diffusers import FluxControlPipeline, FluxTransformer2DModel
from diffusers.utils import load_image
from huggingface_hub import hf_hub_download
import torch

path = "sayakpaul/FLUX.1-dev-edit-v0"
edit_transformer = FluxTransformer2DModel.from_pretrained(path, torch_dtype=torch.bfloat16)
pipeline = FluxControlPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev", transformer=edit_transformer, torch_dtype=torch.bfloat16
).to("cuda")

# load the turbo LoRA
pipeline.load_lora_weights(
    hf_hub_download("ByteDance/Hyper-SD", "Hyper-FLUX.1-dev-8steps-lora.safetensors"), adapter_name="hyper-sd"
)
pipeline.set_adapters(["hyper-sd"], adapter_weights=[0.125])


url = "https://huggingface.co./datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/assets/mushroom.jpg"
image = load_image(url) # resize as needed.
print(image.size)

prompt = "turn the color of mushroom to gray"
image = pipeline(
    control_image=image,
    prompt=prompt,
    guidance_scale=30., # change this as needed.
    num_inference_steps=8, # change this as needed.
    max_sequence_length=512,
    height=image.height,
    width=image.width,
    generator=torch.manual_seed(0)
).images[0]
image.save("edited_image.png")

Comparison

50 steps	8 steps

You can also choose to perform quantization if the memory requirements cannot be satisfied further w.r.t your hardware. Refer to the Diffusers documentation to learn more.

guidance_scale also impacts the results:

Prompt	Collage (gs: 10)	Collage (gs: 20)	Collage (gs: 30)	Collage (gs: 40)
Give this the look of a traditional Japanese woodblock print.
transform the setting to a winter scene
turn the color of mushroom to gray

Limitations and bias

Expect the model to perform underwhelmingly as we don't know the exact training details of Flux Control.

Training details

Fine-tuning codebase is here. Training hyperparameters:

Per GPU batch size: 4
Gradient accumulation steps: 4
Guidance scale: 30
BF16 mixed-precision
AdamW optimizer (8bit from bitsandbytes)
Constant learning rate of 5e-5
Weight decay of 1e-6
20000 training steps

Training was conducted using a node of 8xH100s.

We used a simplified flow mechanism to perform the linear interpolation. In pseudo-code, that looks like:

sigmas = torch.rand(batch_size)
timesteps = (sigmas * noise_scheduler.config.num_train_timesteps).long()
...

noisy_model_input = (1.0 - sigmas) * pixel_latents + sigmas * noise

where pixel_latents is computed from the source images and noise is drawn from a Gaussian distribution. For more details, check out the repository.

sayakpaul
/

FLUX.1-dev-edit-v0