KBlueLeaf commited on
Commit
b0554a3
·
verified ·
1 Parent(s): 7ee026a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +106 -5
README.md CHANGED
@@ -1,5 +1,106 @@
1
- ---
2
- license: other
3
- license_name: fair-ai-public-license-1.0-sd
4
- license_link: https://freedevproject.org/faipl-1.0-sd/
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: fair-ai-public-license-1.0-sd
4
+ license_link: https://freedevproject.org/faipl-1.0-sd/
5
+ datasets:
6
+ - KBlueLeaf/danbooru2023-webp-4Mpixel
7
+ - KBlueLeaf/danbooru2023-sqlite
8
+ language:
9
+ - en
10
+ library_name: diffusers
11
+ pipeline_tag: text-to-image
12
+ ---
13
+
14
+ # Kohaku XL Zeta
15
+ join us: https://discord.gg/tPBsKDyRR5
16
+
17
+ ## Highlights
18
+ - Resume from Kohaku-XL-Epsilon rev2
19
+ - More stable, long/detailed prompt is not a requirement now.
20
+ - Better fidelity on style and character, support more style.
21
+ - CCIP metric surpass Sanae XL anime. have over 2200 character with CCIP score > 0.9 in 3700 character set.
22
+ - Trained on both danbooru tags and natural language, better ability on nl caption.
23
+ - Trained on combined dataset, not only danbooru
24
+ - danbooru (7.6M images, last id 7832883, 2024/07/10)
25
+ - pixiv (filtered from 2.6M special set, will release the url set)
26
+ - pvc figure (around 30k images, internal source)
27
+ - realbooru (around 90k images, for regularization)
28
+ - 8.46M images in total
29
+ - Since the model is trained on both kind of caption, the ctx length limit is extended to 300.
30
+
31
+
32
+ ## Usage (PLEASE READ THIS SECTION)
33
+ ### Recommended Generation Settings
34
+ - resolution: 1024x1024 or similar pixel count
35
+ - cfg scale: 3.5~6.5
36
+ - sampler/scheduler:
37
+ - Euler (A) / any scheduler
38
+ - DPM++ series / exponential scheduler
39
+ - for other sampler, I personally recommend exponential scheduler.
40
+ - step: 12~50
41
+
42
+ ### Prompt Gen
43
+ DTG series prompt gen can still be used on KXL zeta.
44
+ A brand new prompt gen for cooperating both tag and nl caption is under developing.
45
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/630593e2fca1d8d92b81d2a1/HYUT5u3DS1bRhAOYmrFe5.png)
46
+
47
+ ### Prompt Format
48
+ As same as Kohaku XL Epsilon or Delta, but you can replace "general tags" with "natural language caption".
49
+ You can also put both together.
50
+
51
+ ### Special Tags
52
+ - Quality tags: masterpiece, best quality, great quality, good quality, normal quality, low quality, worst quality
53
+ - Rating tags: safe, sensitive, nsfw, explicit
54
+ - Date tags: newest, recent, mid, early, old
55
+
56
+ #### Rating tags
57
+ General: safe
58
+ Sensitive: sensitive
59
+ Questionable: nsfw
60
+ Explicit: nsfw, explicit
61
+
62
+ ## Dataset
63
+ For better ability on some certain concepts, I use full danbooru dataset instead of filterd one.
64
+ Than use crawled Pixiv dataset (from 3~5 tag with popularity sort) as addon dataset.
65
+ Since Pixiv's search system only allow 5000 page per tag so there is not much meaningful image, and some of them are duplicated with danbooru set(but since I want to reinforce these concept I directly ignore the duplication)
66
+
67
+ As same as kxl eps rev2, I add realbooru and pvc figure images for more flexibility on concept/style.
68
+
69
+ ## Training
70
+ - Hardware: Quad RTX 3090s
71
+ - Num Train Images: 8,468,798
72
+ - Total Epoch: 1
73
+ - Total Steps: 16548
74
+ - Training Time: 430 hours (wall time)
75
+ - Batch Size: 4
76
+ - Grad Accumulation Step: 32
77
+ - Equivalent Batch Size: 512
78
+ - Optimizer: Lion8bit
79
+ - Learning Rate: 1e-5 for UNet / TE training disabled
80
+ - LR Scheduler: Constant (with warmup)
81
+ - Warmup Steps: 100
82
+ - Weight Decay: 0.1
83
+ - Betas: 0.9, 0.95
84
+ - Min SNR Gamma: 5
85
+ - Debiased Estimation Loss: Enabled
86
+ - IP Noise Gamma: 0.05
87
+ - Resolution: 1024x1024
88
+ - Min Bucket Resolution: 256
89
+ - Max Bucket Resolution: 4096
90
+ - Mixed Precision: FP16
91
+ - Caption Tag Dropout: 0.2
92
+ - Caption Group Dropout: 0.2 (for dropping tag/nl caption entirely)
93
+
94
+
95
+ ## Why do you still use SDXL but not any Brand New DiT-Based Models?
96
+ Why do you think HunYuan or SD3 or Flux or AuraFlow will be better choice even if they are slower than SDXL and more difficult to finetune? <br>
97
+ Why do you think DiT-based will be better choice even if the DiT paper use 9 times sample seen to surpass LDM-4? <br>
98
+ Do you know the most of "improvements" of these "DiT models" is mostly about dataset and scaling? <br>
99
+ Do you know "UNet" in SDXL have more than 1.75B or 70% parameter in transformer block?
100
+
101
+ Unless any one give me reasonable compute resource or any team release efficient enough DiT or I will not train any DiT-based anime base model. <br>
102
+ But if you give me 8xH100 for an year, I can even train lot of DiT from scratch (If you want)
103
+
104
+
105
+ ## License:
106
+ Fair-AI-public-1.0-sd