SDXS-512-0.9

SDXS is a model that can generate high-resolution images in real-time based on prompt texts, trained using score distillation and feature matching. For more information, please refer to our research paper: SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions. We open-source the model as part of the research.

SDXS-512-0.9 is a old version of SDXS-512. For some reasons, we are only releasing this version for the time being, and will gradually release other versions.

Model Information:

Teacher DM: SD Turbo
Offline DM: SD v2.1 base
VAE: TAESD

Note that TAESD may produce low-quality images when weight_type is float16. Our image decoder is not compatible with the current version of diffusers, so it will not be provided now.

Diffusers Usage

import torch
from diffusers import StableDiffusionPipeline, AutoencoderKL

repo = "IDKiro/sdxs-512-0.9"
seed = 42
weight_type = torch.float32     # or float16

# Load model.
pipe = StableDiffusionPipeline.from_pretrained(repo, torch_dtype=weight_type)
# pipe.vae = AutoencoderKL.from_pretrained("IDKiro/sdxs-512-0.9/vae_large")     # use original VAE
pipe.to("cuda")

prompt = "portrait photo of a girl, photograph, highly detailed face, depth of field, moody light, golden hour"

# Ensure using the same inference steps as the loaded model and CFG set to 0.
image = pipe(
    prompt, 
    num_inference_steps=1, 
    guidance_scale=0,
    generator=torch.Generator(device="cuda").manual_seed(seed)
).images[0]

image.save("output.png")

Cite Our Work

@article{song2024sdxs,
  author    = {Yuda Song, Zehao Sun, Xuanwu Yin},
  title     = {SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions},
  journal   = {arxiv},
  year      = {2024},
}