SD15-768 / README.md
panopstor's picture
Update README.md
875f115
|
raw
history blame
1.51 kB
metadata
license: creativeml-openrail-m
language:
  - en
pipeline_tag: text-to-image

This is a fine tune on Stable Diffusion 1.5 with the MSE VAE from RunwayML, tuned to generate at a nomimal size of 768x768 or various aspects of similar total pixel count.

Fine tuned using EveryDream2 (https://github.com/victorchall/EveryDream2trainer) for 40 epochs across 4 sessions (10 epochs each) on 30,000 handpicked images with a wide variety of imagery from fine art, photography, and video games.
Approximate time to train is 60 hours on an RTX 6000 Ada 48GB, initially trained at 512 nominal size at batch size 12, then 640 at batch size 12, then finally 768 at batch size 8 with 4 gradient accumulation steps. Unet weights tuned with bitsandbytes AdamW8bit with 1e-6 learning rate, with constant learning rate for the first 30 epochs and cosine for the final 10. Text encoder tuned via backpropogation through the Unet using the same AdamW8bit optimizer with a learning rate of 2e-7 cosine for each session and weight decay 0.040 to account for lower normal.

EveryDream2 trainer implements aspect ratio batch fitting so cropping artifacts are greatly reduced. Higher resolution outputs are consistent compared to the original SD1.5 checkpoint which tends to duplicate subject matter beyond the trained 512x512 resolution.

Optimizer states are provided in .pt form for each the text encoder and Unet, which will can be loaded along with the Diffusers weights for resumption in the EveryDream2 trainer software.