tags:
- text-to-image
- stable-diffusion
Control-LoRA Model Card
Introduction
What's better than ControlNets for SDXL? ControlNet... but, more efficient.
By introducing low-rank parameter efficient fine tuning to control networks, we introduce Control-LoRAs.
Integrating the strengths of both ControlNet and PEFT, this approach offers a more efficient and compact method to bring model control for a wider variety of consumer GPUs.
For each model below, you'll find Rank 256
files (reducing the ~4.7GB
ControlNets to ~738MB
) and experimental, ultra-pruned Rank 128
files (reducing to ~377MB
).
Each Control-LoRA has been trained on a diverse range of image concepts and aspect ratios.
MiDaS and ClipDrop Depth
Depth estimation is an image processing technique that determines the distance of objects in a scene, providing a depth map that highlights variations in proximity.
In the example above, we compare the depth results of MiDaS dpt_beit_large_512 with ClipDrop Depth for portraits, and their subsequent use in Depth Control-LoRa.
The Control-LoRA utilizes a grayscale depth map for guided generation.
Canny Edge
Canny Edge Detection is an image processing technique that identifies abrupt changes in intensity to highlight edges in an image.
This Control-LoRA uses the edges from an image to guide the generation.
Photograph and Sketch Colorizer
These two Control-LoRAs can be used to colorize images.
The first is designed to colorize black and white photographs.
The second is designed to color in sketches input as a white-on-black image (either hand-drawn, or created with a SoftEdge_PIDI
model).
Revision
Revision is a novel approach of using images to prompt SDXL.
It uses pooled CLIP embeddings to produce images conceptually similar to the input. It can be used either in addition, or to replace text prompts.
Revision also includes a blending function for combining multiple image or text concepts, as either positive or negative prompts.