SD15-768 / README.md
panopstor's picture
Update README.md
1f71da9
|
raw
history blame
1.24 kB
---
license: creativeml-openrail-m
language:
- en
pipeline_tag: text-to-image
---
This is a fine tune on Stable Diffusion 1.5 with the MSE VAE from RunwayML, tuned to generate at a nomimal size of 768x768 or various aspects of similar total pixel count.
Fine tuned using EveryDream2 (https://github.com/victorchall/EveryDream2trainer) for 40 epochs across 4 sessions (10 epochs each) on 30,000 handpicked images with a wide variety of imaginery from fine art, photography, and video games.
Approximate 60 hours on an RTX 6000 Ada, initiall trained at 512 nominal size at batch size 12, then 640 at batch size 12, then finally 768 at batch size 8 with 4 gradient accumulation steps.
Unet weights tuned with bitsandbytes AdamW8bit with 1e-6 learning rate with constant learnign rate for the first 30 epochs and cosine for the final 10.
Text encoder tuned via backpropogation through the Unet using the same AdamW8ibit with a learning rate of 2e-7 cosine for each session.
EveryDream2 trainer implements aspect ratio batch fitting so cropping artifacts are greatly reduced. Higher resolution outputs are consistent compared to the original SD1.5 checkpoint which tends to duplicate subject matter beyond the trained 512x512 resolution.