File size: 3,059 Bytes
362d17b 1190a73 362d17b 1190a73 4e980a1 1190a73 609ca28 1190a73 5ada10f 00430b9 9d3f101 365a2ae 1190a73 b980f65 1190a73 36a3735 1190a73 36a3735 1190a73 b36b441 1190a73 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
---
license: openrail++
tags:
- text-to-image
- stable-diffusion
library_name: diffusers
inference: false
---
# SDXS-512-0.9
SDXS is a model that can generate high-resolution images in real-time based on prompt texts, trained using score distillation and feature matching. For more information, please refer to our research paper: [SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions](https://arxiv.org/abs/2403.16627). We open-source the model as part of the research.
SDXS-512-0.9 is a **old version** of SDXS-512. For some reasons, we are only releasing this version for the time being, and will gradually release other versions.
**In order to avoid some possible risks (especially commercial and copyright risks), the SDXS-512-1.0 and SDXS-1024-1.0 will not be available shortly, and as an alternative we will provide new versions with the same generation quality. We expect to have a new version of SDXS-512 available within a week, with almost the same generation quality as SDXS-512-1.0.**
Model Information:
- Teacher DM: [SD Turbo](https://huggingface.co/stabilityai/sd-turbo)
- Offline DM: [SD v2.1 base](https://huggingface.co/stabilityai/stable-diffusion-2-1-base)
- VAE: [TAESD](https://huggingface.co/madebyollin/taesd)
The main differences between this model and version 1.0 are in three aspects:
1. This version employs TAESD, which may produce low-quality images when weight_type is float16. Our image decoder is not compatible with the current version of diffusers, so it will not be provided now.
2. This version did not perform the LoRA-GAN finetune mentioned in the implementation details section, which may result in slightly inferior image details.
3. This version replaces self-attention with cross-attention in the highest resolution stages, which introduces minimal overhead compared to directly removing them.
There is a third-party [Demo](https://huggingface.co/spaces/ameerazam08/SDXS-GPU-Demo) from @ameerazam08. We'll provide an official demo when 1.0 is officially released, which hopefully won't be long.
## Diffusers Usage
![](output.png)
```python
import torch
from diffusers import StableDiffusionPipeline, AutoencoderKL
repo = "IDKiro/sdxs-512-0.9"
seed = 42
weight_type = torch.float32 # or float16
# Load model.
pipe = StableDiffusionPipeline.from_pretrained(repo, torch_dtype=weight_type)
# use original VAE
# pipe.vae = AutoencoderKL.from_pretrained("IDKiro/sdxs-512-0.9/vae_large")
pipe.to("cuda")
prompt = "portrait photo of a girl, photograph, highly detailed face, depth of field, moody light, golden hour"
# Ensure using 1 inference step and CFG set to 0.
image = pipe(
prompt,
num_inference_steps=1,
guidance_scale=0,
generator=torch.Generator(device="cuda").manual_seed(seed)
).images[0]
image.save("output.png")
```
## Cite Our Work
```
@article{song2024sdxs,
author = {Yuda Song, Zehao Sun, Xuanwu Yin},
title = {SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions},
journal = {arxiv},
year = {2024},
}
```
|