--- license: openrail++ tags: - text-to-image - stable-diffusion library_name: diffusers inference: false --- # SDXS-512-0.9 SDXS is a model that can generate high-resolution images in real-time based on prompt texts, trained using score distillation and feature matching. For more information, please refer to our research paper: [SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions](https://arxiv.org/abs/2403.16627). We open-source the model as part of the research. SDXS-512-0.9 is a **old version** of SDXS-512. For some reasons, we are only releasing this version for the time being, and will gradually release other versions. Model Information: - Teacher DM: [SD Turbo](https://huggingface.co./stabilityai/sd-turbo) - Offline DM: [SD v2.1 base](https://huggingface.co./stabilityai/stable-diffusion-2-1-base) - VAE: [TAESD](https://huggingface.co./madebyollin/taesd) Note that TAESD may produce low-quality images when weight_type is float16. Our image decoder is not compatible with the current version of diffusers, so it will not be provided now. ## Diffusers Usage ![](output.png) ```python import torch from diffusers import StableDiffusionPipeline, AutoencoderKL repo = "IDKiro/sdxs-512-0.9" seed = 42 weight_type = torch.float32 # or float16 # Load model. pipe = StableDiffusionPipeline.from_pretrained(repo, torch_dtype=weight_type) # pipe.vae = AutoencoderKL.from_pretrained("IDKiro/sdxs-512-0.9/vae_large") # use original VAE pipe.to("cuda") prompt = "portrait photo of a girl, photograph, highly detailed face, depth of field, moody light, golden hour" # Ensure using the same inference steps as the loaded model and CFG set to 0. image = pipe( prompt, num_inference_steps=1, guidance_scale=0, generator=torch.Generator(device="cuda").manual_seed(seed) ).images[0] image.save("output.png") ``` ## Cite Our Work ``` @article{song2024sdxs, author = {Yuda Song, Zehao Sun, Xuanwu Yin}, title = {SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions}, journal = {arxiv}, year = {2024}, } ```