--- license: apache-2.0 tags: - text-to-image - flux datasets: - DucHaiten/pony-art - jordandavis/fashion_num_people - mattmdjaga/human_parsing_dataset - Voxel51/Describable-Textures-Dataset - twodgirl/vndb --- # Flux Latent Preview at Half-Size The decoder provides a preview image; such thing already exists in the wild for the Flux Dev model. Max supported resolution is between 768 and 1024px. ![](images/etoiles.png) Retraining the [text encoder](https://huggingface.co./twodgirl/flux-text-encoder-neutered) and the VAE decoder has reduced the checkpoint size by around 10GB. This set the model's capabilities back by two years. ## Inference ```python from diffusers import AutoencoderKL, FluxPipeline from safetensors.torch import load_model from tea_model import TeaDecoder import torch from torchvision import transforms def preview_image(latents, pipe): latents = FluxPipeline._unpack_latents(latents, pipe.default_sample_size * pipe.vae_scale_factor, pipe.default_sample_size * pipe.vae_scale_factor, pipe.vae_scale_factor) tea = TeaDecoder(ch_in=16) load_model(tea, './vae_decoder.safetensors') tea = tea.to(device='cuda') output = tea(latents.to(torch.float32)) / 2.0 + 0.5 preview = transforms.ToPILImage()(output[0].clamp(0, 1)) return preview def full_size_image(latents, pipe): latents = FluxPipeline._unpack_latents(latents, pipe.default_sample_size * pipe.vae_scale_factor, pipe.default_sample_size * pipe.vae_scale_factor, pipe.vae_scale_factor) latents = (latents / pipe.vae.config.scaling_factor) + pipe.vae.config.shift_factor latents = latents.to(dtype=pipe.vae.dtype) torch.cuda.empty_cache() pipe.vae = pipe.vae.to(device='cuda') pixel_values, = pipe.vae.decode(latents, return_dict=False) images = pipe.image_processor.postprocess(pixel_values.to('cpu'), output_type='pil') return images if __name__ == '__main__': pipe = FluxPipeline.from_pretrained('black-forest-labs/FLUX.1-dev') latents = pipe('cat playing piano', num_inference_steps=10, output_type='latent').images # Return the upscaled and preview image. upscaled = full_size_image(latents, pipe) preview = preview_image(latents, pipe) preview.save('cat.png') ``` ## Disclaimer Use of this code and the copy of documentation requires citation and attribution to the author via a link to their Hugging Face profile in all resulting work.