vae-kl-f8-d16 / README.md
subaqua's picture
Duplicate from ostris/vae-kl-f8-d16
fc0d8ea verified
metadata
license: mit
library_name: diffusers

Ostris VAE - KL-f8-d16

A 16 channel VAE with 8x downsample. Trained from scratch on a balance of photos, artistic, text, cartoons, vector images.

It is lighter weight that most VAEs with only 57,266,643 parameters (vs SD3 VAE: 83,819,683) which means it is faster and uses less VRAM yet scores quite similarly on real images. Plus it is MIT licensed so you can do whatever you want with it.

VAE PSNR (higher better) LPIPS (lower better) # params
sd-vae-ft-mse 26.939 0.0581 83,653,863
SDXL 27.370 0.0540 83,653,863
SD3 31.681 0.0187 83,819,683
Ostris KL-f8-d16 31.166 0.0198 57,266,643

Compare

Check out the comparison at imgsli.

What do I do with this?

If you don't know, you probably don't need this. This is made as an open source lighter version of a 16ch vae. You would need to train it into a network before it is useful. I plan to do this myself for SD 1.5, SDXL, and possibly pixart. Follow me on Twitter to keep up with my work on that.

Note: Not SD3 compatable

This VAE is not SD3 compatable as it is trained from scratch and has an entirely different latent space.