metadata

license: openrail++
tags:
  - text-to-image
  - stable-diffusion
library_name: diffusers
inference: false

SDXS-512-0.9

SDXS is a model that can generate high-resolution images in real-time based on prompt texts, trained using score distillation and feature matching. For more information, please refer to our research paper: SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions. We open-source the model as part of the research.

SDXS-512-0.9 is a old version of SDXS-512. For some reasons, we are only releasing this version for the time being, and will gradually release other versions.

In order to avoid some possible risks (especially commercial and copyright risks), the SDXS-512-1.0 and SDXS-1024-1.0 will not be available shortly, and as an alternative we will provide new versions with the same generation quality. We expect to have a new version of SDXS-512 available within a week, with almost the same generation quality as SDXS-512-1.0.

Model Information:

Teacher DM: SD Turbo
Offline DM: SD v2.1 base
VAE: TAESD

The main differences between this model and version 1.0 are in three aspects:

This version employs TAESD, which may produce low-quality images when weight_type is float16. Our image decoder is not compatible with the current version of diffusers, so it will not be provided now.
This version did not perform the LoRA-GAN finetune mentioned in the implementation details section, which may result in slightly inferior image details.
This version replaces self-attention with cross-attention in the highest resolution stages, which introduces minimal overhead compared to directly removing them.

There is a third-party Demo from @ameerazam08. We'll provide an official demo when 1.0 is officially released, which hopefully won't be long.

Diffusers Usage

import torch
from diffusers import StableDiffusionPipeline, AutoencoderKL

repo = "IDKiro/sdxs-512-0.9"
seed = 42
weight_type = torch.float32     # or float16

# Load model.
pipe = StableDiffusionPipeline.from_pretrained(repo, torch_dtype=weight_type)

# use original VAE
# pipe.vae = AutoencoderKL.from_pretrained("IDKiro/sdxs-512-0.9/vae_large")
 
pipe.to("cuda")

prompt = "portrait photo of a girl, photograph, highly detailed face, depth of field, moody light, golden hour"

# Ensure using 1 inference step and CFG set to 0.
image = pipe(
    prompt, 
    num_inference_steps=1, 
    guidance_scale=0,
    generator=torch.Generator(device="cuda").manual_seed(seed)
).images[0]

image.save("output.png")

Cite Our Work

@article{song2024sdxs,
  author    = {Yuda Song, Zehao Sun, Xuanwu Yin},
  title     = {SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions},
  journal   = {arxiv},
  year      = {2024},
}