patrickvonplaten commited on
Commit
9b12cba
1 Parent(s): 4596fc2

Add 1 files

Browse files
Files changed (1) hide show
  1. README.md +6 -153
README.md CHANGED
@@ -1,157 +1,10 @@
1
-
2
  ---
3
- license: openrail++
4
- base_model: stabilityai/stable-diffusion-xl-base-1.0
5
  tags:
6
- - stable-diffusion-xl
7
- - stable-diffusion-xl-diffusers
8
  - text-to-image
9
- - diffusers
10
- - controlnet
11
- inference: false
12
  ---
13
-
14
- # SDXL-controlnet: Zoe-Depth
15
-
16
- These are ControlNet weights trained on stabilityai/stable-diffusion-xl-base-1.0 with zoe depth conditioning. [Zoe-depth](https://github.com/isl-org/ZoeDepth) is an open-source SOTA depth estimation model which produces high-quality depth maps, which are better suited for conditioning.
17
-
18
- You can find some example images in the following.
19
-
20
- ![images_0)](./zoe-depth-example.png)
21
-
22
- ![images_2](./zoe-megatron.png)
23
-
24
- ![images_3](./photo-woman.png)
25
-
26
- ## Usage
27
-
28
- Make sure first to install the libraries:
29
-
30
- ```bash
31
- pip install accelerate transformers safetensors diffusers
32
- ```
33
-
34
- And then setup the zoe-depth model
35
-
36
- ```
37
- import torch
38
- import matplotlib
39
- import matplotlib.cm
40
- import numpy as np
41
-
42
- torch.hub.help("intel-isl/MiDaS", "DPT_BEiT_L_384", force_reload=True) # Triggers fresh download of MiDaS repo
43
- model_zoe_n = torch.hub.load("isl-org/ZoeDepth", "ZoeD_NK", pretrained=True).eval()
44
- model_zoe_n = model_zoe_n.to("cuda")
45
-
46
-
47
- def colorize(value, vmin=None, vmax=None, cmap='gray_r', invalid_val=-99, invalid_mask=None, background_color=(128, 128, 128, 255), gamma_corrected=False, value_transform=None):
48
- if isinstance(value, torch.Tensor):
49
- value = value.detach().cpu().numpy()
50
-
51
- value = value.squeeze()
52
- if invalid_mask is None:
53
- invalid_mask = value == invalid_val
54
- mask = np.logical_not(invalid_mask)
55
-
56
- # normalize
57
- vmin = np.percentile(value[mask],2) if vmin is None else vmin
58
- vmax = np.percentile(value[mask],85) if vmax is None else vmax
59
- if vmin != vmax:
60
- value = (value - vmin) / (vmax - vmin) # vmin..vmax
61
- else:
62
- # Avoid 0-division
63
- value = value * 0.
64
-
65
- # squeeze last dim if it exists
66
- # grey out the invalid values
67
-
68
- value[invalid_mask] = np.nan
69
- cmapper = matplotlib.cm.get_cmap(cmap)
70
- if value_transform:
71
- value = value_transform(value)
72
- # value = value / value.max()
73
- value = cmapper(value, bytes=True) # (nxmx4)
74
-
75
- # img = value[:, :, :]
76
- img = value[...]
77
- img[invalid_mask] = background_color
78
-
79
- # gamma correction
80
- img = img / 255
81
- img = np.power(img, 2.2)
82
- img = img * 255
83
- img = img.astype(np.uint8)
84
- img = Image.fromarray(img)
85
- return img
86
-
87
-
88
- def get_zoe_depth_map(image):
89
- with torch.autocast("cuda", enabled=True):
90
- depth = model_zoe_n.infer_pil(image)
91
- depth = colorize(depth, cmap="gray_r")
92
- return depth
93
- ```
94
-
95
- Now we're ready to go:
96
-
97
- ```python
98
- import torch
99
- import numpy as np
100
- from PIL import Image
101
-
102
- from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline, AutoencoderKL
103
- from diffusers.utils import load_image
104
-
105
- controlnet = ControlNetModel.from_pretrained(
106
- "diffusers/controlnet-zoe-depth-sdxl-1.0",
107
- use_safetensors=True,
108
- torch_dtype=torch.float16,
109
- ).to("cuda")
110
- vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16).to("cuda")
111
- pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
112
- "stabilityai/stable-diffusion-xl-base-1.0",
113
- controlnet=controlnet,
114
- vae=vae,
115
- variant="fp16",
116
- use_safetensors=True,
117
- torch_dtype=torch.float16,
118
- ).to("cuda")
119
- pipe.enable_model_cpu_offload()
120
-
121
-
122
- prompt = "pixel-art margot robbie as barbie, in a coupé . low-res, blocky, pixel art style, 8-bit graphics"
123
- negative_prompt = "sloppy, messy, blurry, noisy, highly detailed, ultra textured, photo, realistic"
124
- image = load_image("https://media.vogue.fr/photos/62bf04b69a57673c725432f3/3:2/w_1793,h_1195,c_limit/rev-1-Barbie-InstaVert_High_Res_JPEG.jpeg")
125
-
126
- controlnet_conditioning_scale = 0.55
127
-
128
- depth_image = get_zoe_depth_map(image).resize((1088, 896))
129
-
130
- generator = torch.Generator("cuda").manual_seed(978364352)
131
- images = pipe(
132
- prompt, image=depth_image, num_inference_steps=50, controlnet_conditioning_scale=controlnet_conditioning_scale, generator=generator
133
- ).images
134
- images[0]
135
-
136
- images[0].save(f"pixel-barbie.png")
137
- ```
138
-
139
- ![images_1)](./barbie.png)
140
-
141
- To more details, check out the official documentation of [`StableDiffusionXLControlNetPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/controlnet_sdxl).
142
-
143
- ### Training
144
-
145
- Our training script was built on top of the official training script that we provide [here](https://github.com/huggingface/diffusers/blob/main/examples/controlnet/README_sdxl.md).
146
-
147
- #### Training data and Compute
148
- The model is trained on 3M image-text pairs from LAION-Aesthetics V2. The model is trained for 700 GPU hours on 80GB A100 GPUs.
149
-
150
- #### Batch size
151
- Data parallel with a single gpu batch size of 8 for a total batch size of 256.
152
-
153
- #### Hyper Parameters
154
- Constant learning rate of 1e-5.
155
-
156
- #### Mixed precision
157
- fp16
 
 
1
  ---
2
+ language:
3
+ - en
4
  tags:
5
+ - stable-diffusion
 
6
  - text-to-image
7
+ license: creativeml-openrail-m
8
+ inference: true
 
9
  ---
10
+ The model is trained on 3M image-text pairs from LAION-Aesthetics V2. The model is trained for 700 GPU hours on 80GB A100 GPUs.