KBlueLeaf
/

Kohaku-XL-Zeta

+---
+license: other
+license_name: fair-ai-public-license-1.0-sd
+license_link: https://freedevproject.org/faipl-1.0-sd/
+datasets:
+- KBlueLeaf/danbooru2023-webp-4Mpixel
+- KBlueLeaf/danbooru2023-sqlite
+language:
+- en
+library_name: diffusers
+pipeline_tag: text-to-image
+---
+# Kohaku XL Zeta
+join us: https://discord.gg/tPBsKDyRR5
+## Highlights
+- Resume from Kohaku-XL-Epsilon rev2
+- More stable, long/detailed prompt is not a requirement now.
+- Better fidelity on style and character, support more style.
+  - CCIP metric surpass Sanae XL anime. have over 2200 character with CCIP score > 0.9 in 3700 character set.
+- Trained on both danbooru tags and natural language, better ability on nl caption.
+- Trained on combined dataset, not only danbooru
+  - danbooru (7.6M images, last id 7832883, 2024/07/10)
+  - pixiv (filtered from 2.6M special set, will release the url set)
+  - pvc figure (around 30k images, internal source)
+  - realbooru (around 90k images, for regularization)
+  - 8.46M images in total
+- Since the model is trained on both kind of caption, the ctx length limit is extended to 300.
+## Usage (PLEASE READ THIS SECTION)
+### Recommended Generation Settings
+- resolution: 1024x1024 or similar pixel count
+- cfg scale: 3.5~6.5
+- sampler/scheduler:
+  - Euler (A) / any scheduler
+  - DPM++ series / exponential scheduler
+  - for other sampler, I personally recommend exponential scheduler.
+- step: 12~50
+### Prompt Gen
+DTG series prompt gen can still be used on KXL zeta.
+A brand new prompt gen for cooperating both tag and nl caption is under developing.
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/630593e2fca1d8d92b81d2a1/HYUT5u3DS1bRhAOYmrFe5.png)
+### Prompt Format
+As same as Kohaku XL Epsilon or Delta, but you can replace "general tags" with "natural language caption".
+You can also put both together.
+### Special Tags
+- Quality tags: masterpiece, best quality, great quality, good quality, normal quality, low quality, worst quality
+- Rating tags: safe, sensitive, nsfw, explicit
+- Date tags: newest, recent, mid, early, old
+#### Rating tags
+General: safe
+Sensitive: sensitive
+Questionable: nsfw
+Explicit: nsfw, explicit
+## Dataset
+For better ability on some certain concepts, I use full danbooru dataset instead of filterd one.
+Than use crawled Pixiv dataset (from 3~5 tag with popularity sort) as addon dataset.
+Since Pixiv's search system only allow 5000 page per tag so there is not much meaningful image, and some of them are duplicated with danbooru set(but since I want to reinforce these concept I directly ignore the duplication)
+As same as kxl eps rev2, I add realbooru and pvc figure images for more flexibility on concept/style.
+## Training
+- Hardware: Quad RTX 3090s
+- Num Train Images: 8,468,798
+- Total Epoch: 1
+- Total Steps: 16548
+- Training Time: 430 hours (wall time)
+- Batch Size: 4
+- Grad Accumulation Step: 32
+- Equivalent Batch Size: 512
+- Optimizer: Lion8bit
+- Learning Rate: 1e-5 for UNet / TE training disabled
+- LR Scheduler: Constant (with warmup)
+- Warmup Steps: 100
+- Weight Decay: 0.1
+- Betas: 0.9, 0.95
+- Min SNR Gamma: 5
+- Debiased Estimation Loss: Enabled
+- IP Noise Gamma: 0.05
+- Resolution: 1024x1024
+- Min Bucket Resolution: 256
+- Max Bucket Resolution: 4096
+- Mixed Precision: FP16
+- Caption Tag Dropout: 0.2
+- Caption Group Dropout: 0.2 (for dropping tag/nl caption entirely)
+## Why do you still use SDXL but not any Brand New DiT-Based Models?
+Why do you think HunYuan or SD3 or Flux or AuraFlow will be better choice even if they are slower than SDXL and more difficult to finetune? <br>
+Why do you think DiT-based will be better choice even if the DiT paper use 9 times sample seen to surpass LDM-4? <br>
+Do you know the most of "improvements" of these "DiT models" is mostly about dataset and scaling? <br>
+Do you know "UNet" in SDXL have more than 1.75B or 70% parameter in transformer block?
+Unless any one give me reasonable compute resource or any team release efficient enough DiT or I will not train any DiT-based anime base model. <br>
+But if you give me 8xH100 for an year, I can even train lot of DiT from scratch (If you want)
+## License:
+Fair-AI-public-1.0-sd