|
--- |
|
license: openrail++ |
|
tags: |
|
- text-to-image |
|
- stable-diffusion |
|
library_name: diffusers |
|
inference: false |
|
--- |
|
|
|
# SDXS-512-0.9 |
|
|
|
SDXS is a model that can generate high-resolution images in real-time based on prompt texts, trained using score distillation and feature matching. For more information, please refer to our research paper: [SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions](https://arxiv.org/abs/2403.16627). We open-source the model as part of the research. |
|
|
|
SDXS-512-0.9 is a **old version** of SDXS-512. For some reasons, we are only releasing this version for the time being, and will gradually release other versions. |
|
|
|
**In order to avoid some possible risks (especially commercial and copyright risks), the SDXS-512-1.0 and SDXS-1024-1.0 will not be available shortly, and as an alternative we will provide new versions with the same generation quality. We expect to have a new version of SDXS-512 available within a week, with almost the same generation quality as SDXS-512-1.0.** |
|
|
|
Model Information: |
|
- Teacher DM: [SD Turbo](https://huggingface.co/stabilityai/sd-turbo) |
|
- Offline DM: [SD v2.1 base](https://huggingface.co/stabilityai/stable-diffusion-2-1-base) |
|
- VAE: [TAESD](https://huggingface.co/madebyollin/taesd) |
|
|
|
The main differences between this model and version 1.0 are in three aspects: |
|
1. This version employs TAESD, which may produce low-quality images when weight_type is float16. Our image decoder is not compatible with the current version of diffusers, so it will not be provided now. |
|
2. This version did not perform the LoRA-GAN finetune mentioned in the implementation details section, which may result in slightly inferior image details. |
|
3. This version replaces self-attention with cross-attention in the highest resolution stages, which introduces minimal overhead compared to directly removing them. |
|
|
|
There is a third-party [Demo](https://huggingface.co/spaces/ameerazam08/SDXS-GPU-Demo) from @ameerazam08. We'll provide an official demo when 1.0 is officially released, which hopefully won't be long. |
|
|
|
## Diffusers Usage |
|
|
|
![](output.png) |
|
|
|
```python |
|
import torch |
|
from diffusers import StableDiffusionPipeline, AutoencoderKL |
|
|
|
repo = "IDKiro/sdxs-512-0.9" |
|
seed = 42 |
|
weight_type = torch.float32 # or float16 |
|
|
|
# Load model. |
|
pipe = StableDiffusionPipeline.from_pretrained(repo, torch_dtype=weight_type) |
|
|
|
# use original VAE |
|
# pipe.vae = AutoencoderKL.from_pretrained("IDKiro/sdxs-512-0.9/vae_large") |
|
|
|
pipe.to("cuda") |
|
|
|
prompt = "portrait photo of a girl, photograph, highly detailed face, depth of field, moody light, golden hour" |
|
|
|
# Ensure using 1 inference step and CFG set to 0. |
|
image = pipe( |
|
prompt, |
|
num_inference_steps=1, |
|
guidance_scale=0, |
|
generator=torch.Generator(device="cuda").manual_seed(seed) |
|
).images[0] |
|
|
|
image.save("output.png") |
|
``` |
|
|
|
## Cite Our Work |
|
|
|
``` |
|
@article{song2024sdxs, |
|
author = {Yuda Song, Zehao Sun, Xuanwu Yin}, |
|
title = {SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions}, |
|
journal = {arxiv}, |
|
year = {2024}, |
|
} |
|
``` |
|
|