ByzHero commited on
Commit
c7d4752
·
1 Parent(s): 3744bc8
Files changed (3) hide show
  1. README.md +73 -0
  2. config.json +16 -0
  3. diffusion_pytorch_model.safetensors +3 -0
README.md ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SD3 Controlnet softedge
2
+ The softedge controlnet is finetuned based on SD3-medium. It is trained using 12M open source and internal e-commerce dataset, and achieve good performance on both general and e-commerce image generation. It supports preprocessors such as pidinet, hed as well as their safe mode.
3
+
4
+
5
+ ## Examples
6
+ From left to right: pidinet preprocessor, ours with pidinet, hed preprocessor, ours with hed.
7
+
8
+ `pidinet`|`controlnet`|`hed`|`controlnet`
9
+ :--:|:--:|:--:|:--:
10
+ ![images)](./images/im1_1.webp) | ![images)](./images/im1_2.webp) | ![images)](./images/im1_3.webp) | ![images)](./images/im1_4.webp)
11
+ ![images)](./images/im2_1.webp) | ![images)](./images/im2_2.webp) | ![images)](./images/im2_3.webp) | ![images)](./images/im2_4.webp)
12
+ ![images)](./images/im3_1.webp) | ![images)](./images/im3_2.webp) | ![images)](./images/im3_3.webp) | ![images)](./images/im3_4.webp)
13
+ ![images)](./images/im4_1.webp) | ![images)](./images/im4_2.webp) | ![images)](./images/im4_3.webp) | ![images)](./images/im4_4.webp)
14
+ ![images)](./images/im5_1.webp) | ![images)](./images/im5_2.webp) | ![images)](./images/im5_3.webp) | ![images)](./images/im5_4.webp)
15
+
16
+
17
+
18
+ ## Usage with Diffusers
19
+ ```python
20
+ import torch
21
+ from diffusers.utils import load_image, check_min_version
22
+ from diffusers.models import SD3ControlNetModel
23
+ from diffusers import StableDiffusion3ControlNetPipeline
24
+ from controlnet_aux import PidiNetDetector
25
+
26
+ controlnet = SD3ControlNetModel.from_pretrained(
27
+ "alimama-creative/SD3-Controlnet-Softedge",torch_dtype=torch.float16
28
+ )
29
+ pipe = StableDiffusion3ControlNetPipeline.from_pretrained(
30
+ "stabilityai/stable-diffusion-3-medium-diffusers",
31
+ controlnet=controlnet,
32
+ variant="fp16",
33
+ torch_dtype=torch.float16,
34
+ )
35
+ pipe.text_encoder.to(torch.float16)
36
+ pipe.controlnet.to(torch.float16)
37
+ pipe.to("cuda")
38
+
39
+ image = load_image(
40
+ "https://huggingface.co/alimama-creative/SD3-Controlnet-Softedge/resolve/main/images/im1_0.png"
41
+ )
42
+ prompt = "A dog sitting on a park bench."
43
+ width = 1024
44
+ height = 1024
45
+
46
+ edge_processor = PidiNetDetector.from_pretrained('lllyasviel/Annotators')
47
+ edge_image = edge_processor(image, detect_resolution=width, image_resolution=width)
48
+
49
+ res_image = pipe(
50
+ prompt=prompt,
51
+ negative_prompt="deformed, distorted, disfigured, poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, mutated hands and fingers, disconnected limbs, mutation, mutated, ugly, disgusting, blurry, amputation, NSFW",
52
+ height=height,
53
+ width=width,
54
+ control_image=edge_image,
55
+ num_inference_steps=25,
56
+ controlnet_conditioning_scale=0.95,
57
+ guidance_scale=5,
58
+ ).images[0]
59
+ res_image.save("sd3.png")
60
+
61
+ ```
62
+
63
+ ## Training Detail
64
+ The model was trained on 12M laion2B and internal sources images with aesthetic 6+ for 20k steps at resolution 1024x1024. ControlNet with 6, 12 and 23 layers have been explored, and the 12-layer model achieves a good balance between performance and model size, so we release the 12-layer model.
65
+
66
+ Mixed precision : FP16<br/>
67
+ Learning rate : 1e-4<br/>
68
+ Batch size : 256<br/>
69
+ Timestep sampling mode : 'logit_normal'<br/>
70
+ Loss : Flow Matching<br/>
71
+
72
+ ## LICENSE
73
+ The model is based on SD3 finetuning; therefore, the license follows the original SD3 license.
config.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "SD3ControlNetModel",
3
+ "_diffusers_version": "0.30.0",
4
+ "_name_or_path": "./model_hub_tmp_0/.",
5
+ "attention_head_dim": 64,
6
+ "caption_projection_dim": 1536,
7
+ "in_channels": 16,
8
+ "joint_attention_dim": 4096,
9
+ "num_attention_heads": 24,
10
+ "num_layers": 12,
11
+ "out_channels": 16,
12
+ "patch_size": 2,
13
+ "pooled_projection_dim": 2048,
14
+ "pos_embed_max_size": 192,
15
+ "sample_size": 128
16
+ }
diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4c22ff0d92562c4504dd9545ef0c0cb805d3a786f2c0e821662a5c0b82a4e255
3
+ size 2238999304