Text-to-Image
anime
girls
FA770 commited on
Commit
1c8b7fa
·
verified ·
1 Parent(s): bd01dd1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +97 -3
README.md CHANGED
@@ -1,3 +1,97 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - KBlueLeaf/danbooru2023-webp-4Mpixel
5
+ - KBlueLeaf/danbooru2023-metadata-database
6
+ base_model:
7
+ - black-forest-labs/FLUX.1-schnell
8
+ pipeline_tag: text-to-image
9
+ tags:
10
+ - anime
11
+ - girls
12
+ ---
13
+
14
+ ![sample_image](./sample_images/1.webp)
15
+
16
+ # Model Information
17
+
18
+ **Note:** This model is a Schnell based model, but it requires CFG scale 3.5 or higher and 20 steps or more. It needs to be used with `clip_l_sumeshi_f1s`.
19
+
20
+ My English is terrible, so I use translation tools.
21
+
22
+ ## Description
23
+ Sumeshi flux.1 S is an experimental anime model to verify if de-distilling and enabling CFG will function. You can use a negative prompt which works to some extent. Since this model uses CFG, it takes about twice as long to generate compared to a regular FLUX model, even with the same number of steps. The output is blurred and the style varies depending on the prompt, perhaps because the model has not been fully trained.
24
+
25
+ ## Usage
26
+ - Resolution: Like other Flux models
27
+ - **CFG Scale:** 3.5 ~ 7 ( Scale1 does not generate decent outputs. )
28
+ - **Steps:** 20 ~ 60 (Not around 4 steps)
29
+ - **(Distilled) Guidance Scale:** 0 (Does not work due to Schnell-based model)
30
+ - sampler: Euler
31
+ - scheduler: Simple, Beta
32
+
33
+ ## Prompt Format (from [Kohaku-XL-Epsilon](https://huggingface.co/KBlueLeaf/Kohaku-XL-Epsilon))
34
+ ```<1girl/1boy/1other/...>, <character>, <series>, <artists>, <general tags>, <quality tags>, <year tags>, <meta tags>, <rating tags>```
35
+
36
+ Due to the small amount of training, the `<character><series><artists>` tags are almost non-functional. As training is focused on girl characters, it may not generate boy or other non-persons well. Since the dataset was created using hakubooru, the prompt format will be the same as the KohakuXL format. However, based on experiments, it is not strictly necessary to follow this format, as it interprets meaning to some extent even in natural language.
37
+
38
+ ### Special Tags
39
+ - **Quality Tags:** masterpiece, best quality, great quality, good quality, normal quality, low quality, worst quality
40
+ - **Rating Tags:** safe, sensitive, nsfw, explicit
41
+ - **Date Tags:** newest, recent, mid, early, old
42
+
43
+ ## Training
44
+
45
+ ### Dataset Preparation
46
+ I used [hakubooru](https://github.com/KohakuBlueleaf/HakuBooru)-based custom scripts.
47
+
48
+ - **Exclude Tags:** `traditional_media, photo_(medium), scan, animated, animated_gif, lowres, non-web_source, variant_set, tall image, duplicate, pixel-perfect_duplicate`
49
+ - **Minimum Post ID:** 1,000,000
50
+
51
+ ### Key Addition
52
+ I added tensors filled with zeros with the `guidance_in` key to the Schnell model. This tensor is adjusted to the shape of the corresponding key in Dev, as inferred from `flux/src/flux/model.py`. This is because the trainer did not work properly when these keys were missing if the model name did not include 'schnell'. Since it is filled with zeros, I understand that guidance, like in the Schnell model, will not function. Due to my limited skills and the forceful addition, I'm not sure if this was the correct approach.
53
+
54
+ ### Training Details
55
+ Basically, the assumption is that the more we learn, the more the network will be reconstructed, the more the distillation will be lifted, and the more CFGs will be available.
56
+
57
+ - **Training Hardware:** A single RTX 4090
58
+ - **Method:** LoRA training and merging the results
59
+ - **Training Script:** [sd-scripts](https://github.com/kohya-ss/sd-scripts)
60
+ - **Basic Settings:**
61
+ `accelerate launch --num_cpu_threads_per_process 4 flux_train_network.py --network_module networks.lora_flux --sdpa --gradient_checkpointing --cache_latents --cache_latents_to_disk --cache_text_encoder_outputs --cache_text_encoder_outputs_to_disk --max_data_loader_n_workers 1 --save_model_as "safetensors" --mixed_precision "bf16" --fp8_base --save_precision "bf16" --full_bf16 --min_bucket_reso 320 --max_bucket_reso 1536 --seed 1 --max_train_epochs 1 --keep_tokens_separator "|||" --network_dim 32 --network_alpha 32 --unet_lr 1e-4 --text_encoder_lr 5e-5 --train_batch_size 3 --gradient_accumulation_steps 2 --optimizer_type adamw8bit --lr_scheduler="constant_with_warmup" --lr_warmup_steps 100 --vae_batch_size 8 --cache_info --guidance_scale 7 --timestep_sampling shift --model_prediction_type raw --discrete_flow_shift 3.2 --loss_type l2 --highvram `
62
+
63
+ 1. 3,893images (res512 bs4 / res768 bs2 / res1024 bs1, acc4) 1epoch
64
+
65
+ 2. 60,000images (res768 bs3 acc2) 1epoch
66
+
67
+ 3. 36,000images (res1024 bs1 acc3) 1epoch
68
+
69
+ 4. 3,000images (res1024 bs1 acc1) 1epoch
70
+
71
+ 5. 18,000images (res1024 bs1 acc3) 1epoch
72
+
73
+ 6. merged into model and CLIP_L
74
+
75
+ 7. 693images (res1024 bs1 acc3) 1epoch
76
+
77
+ 8. 693images (res1024 bs1 acc3 warmup50) 1epoch
78
+
79
+ 9. 693images (res1024 bs1 acc3 warmup50) 10ecpohs
80
+
81
+ 10. 693images (res1024 bs1 acc3 warmup50) 15ecpohs
82
+
83
+ 11. merged into model and CLIP_L
84
+
85
+ 12. 543images (res1024 bs1 acc3 warmup50 --optimizer_args "betas=0.9,0.95" "eps=1e-06" "weight_decay=0.1" --caption_dropout_rate 0.1 --shuffle_caption --network_train_unet_only) 20epochs
86
+
87
+ 13. merged into model and CLIP_L
88
+
89
+ ## Resources (License)
90
+ - **FLUX.1-schnell (Apache2.0)**
91
+ - **danbooru2023-webp-4Mpixel (MIT)**
92
+ - **danbooru2023-metadata-database (MIT)**
93
+
94
+ ## Acknowledgements
95
+ - **black-forest-labs:** Thanks for publishing a great open source model.
96
+ - **kohya-ss:** Thanks for publishing the essential training scripts and for the quick updates.
97
+ - **Kohaku-Blueleaf:** Thanks for the extensive publication of the scripts for the dataset and the various training conditions.