Text-to-Image

Open-LiteVAE

[github]

This repository contains a LiteVAE model trained with the open-litevae codebase, based on the paper "LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models" [2024].

Note: This model is intended for demonstration purposes, we do not recommend using it in production.

license: AGPL-3.0


Configuration Details

Parameter Value
Downscale Factor 8x
Latent Z dim 12
Encoder Size (params) B (6.2M)
Decoder Size (params) M (54M)
Discriminator UNetGAN-L
Training Set ImageNet-1k
Training Resolution 128x128 --> 256x256
Training Steps 100k --> 50k

Metric Comparison

Model Z dim rFID LPIPS PSNR SSIM
SD1-VAE 4 0.75 0.138 25.70 0.72
SD3-VAE 16 0.22 0.069 29.59 0.86
olvf8c12 (this repo) 12 0.24 0.084 28.74 0.84

Usage

# install open-litevae https://github.com/RGenDiff/open-litevae
#

from PIL import Image
import torch
import torchvision.transforms as transforms
from torchvision.utils import save_image
from omegaconf import OmegaConf
from safetensors.torch import load_file
from olvae.utils import instantiate_from_config

def load_model_from_config(config_path, ckpt_path, device=torch.device("cuda")):
    config = OmegaConf.load(config_path)
    sd = load_file(ckpt_path)
    model = instantiate_from_config(config.model)
    model.load_state_dict(sd, strict=False)
    model = model.to(device).eval()
    return model

# load the model
olitevae = load_model_from_config(config_path="configs/olitevaeB_im_f8c12.yaml", 
                                    ckpt_path="olitevaeB_im_f8c12.safetensors")

img_transforms = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])

# encode
image = img_transforms(Image.open(<your image>)).to(device)
latent = olitevae.encode(image.unsqueeze(0)).sample()
print(latent.shape)

# decode
y = olitevae.decode(latent)
save_image(y[0]*0.5 + 0.5, "decoded_image.png")

Please Cite the Original Paper

@inproceedings{
sadat2024litevae,
title={Lite{VAE}: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models},
author={Seyedmorteza Sadat and Jakob Buhmann and Derek Bradley and Otmar Hilliges and Romann M. Weber},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024},
url={https://openreview.net/forum?id=mTAbl8kUzq}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Collection including RGenDiff/olitevaeB_im_f8c12