This model can encode 224x224 RGB image into 28x28x13bit (1274 bytes) latent. The compression rate is 28x28x13/(224x224x24)=1/118, or 0.203 bpp (same as VQGAN_f8_8192).

Demo: https://huggingface.co./spaces/Blealtan/clip-guided-binary-autoencoder

12M params for Encoder + Decoder. Trained on LAION-Aesthetics V2 5+ for 238M images.

Update: Now with 50M and 200M params checkpoints too :) Check the files.

Guided by https://huggingface.co./laion/CLIP-ViT-B-32-laion2B-s34B-b79K (it's great. better than OpenAI CLIP B/32) and https://github.com/dingkeyan93/DISTS. No GAN loss.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Space using BlinkDL/clip-guided-binary-autoencoder 1