|
--- |
|
language: |
|
- en |
|
pipeline_tag: text-to-image |
|
library_name: diffusers |
|
tags: |
|
- lora |
|
--- |
|
# You Only Sample Once (YOSO) |
|
![overview](overview.jpg) |
|
The YOSO was proposed in "[You Only Sample Once: Taming One-Step Text-To-Image Synthesis by Self-Cooperative Diffusion GANs](https://www.arxiv.org/abs/2403.12931)" by *Yihong Luo, Xiaolong Chen, Jing Tang*. |
|
|
|
Official Repository of this paper: [YOSO](https://github.com/Luo-Yihong/YOSO). |
|
|
|
|
|
## Usage |
|
|
|
### 1-step inference |
|
1-step inference is only allowed based on SD v1.5 for now. And you should prepare the informative initialization according to the paper for better results. |
|
```python |
|
import torch |
|
from diffusers import DiffusionPipeline, LCMScheduler |
|
pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype = torch.float16) |
|
pipeline = pipeline.to('cuda') |
|
pipeline.scheduler = LCMScheduler.from_config(pipeline.scheduler.config) |
|
pipeline.load_lora_weights('Luo-Yihong/yoso_sd1.5_lora') |
|
generator = torch.manual_seed(318) |
|
steps = 1 |
|
bs = 1 |
|
latents = ... # maybe some latent codes of real images or SD generation |
|
latent_mean = latent.mean(dim=0) |
|
init_latent = latent_mean.repeat(bs,1,1,1) + latents.std()*torch.randn_like(latents) |
|
noise = torch.randn([1,bs,64,64]) |
|
input_latent = pipeline.scheduler.add_noise(init_latent,noise,T) |
|
imgs= pipeline(prompt="A photo of a dog", |
|
num_inference_steps=steps, |
|
num_images_per_prompt = 1, |
|
generator = generator, |
|
guidance_scale=1.5, |
|
latents = input_latent, |
|
)[0] |
|
imgs |
|
``` |
|
|
|
The simple inference without informative initialization, but worse quality: |
|
```python |
|
pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype = torch.float16) |
|
pipeline = pipeline.to('cuda') |
|
pipeline.scheduler = LCMScheduler.from_config(pipeline.scheduler.config) |
|
pipeline.load_lora_weights('Luo-Yihong/yoso_sd1.5_lora') |
|
generator = torch.manual_seed(318) |
|
steps = 1 |
|
imgs = pipeline(prompt="A photo of a corgi in forest, highly detailed, 8k, XT3.", |
|
num_inference_steps=1, |
|
num_images_per_prompt = 1, |
|
generator = generator, |
|
guidance_scale=1., |
|
)[0] |
|
imgs[0] |
|
``` |
|
![Corgi](corgi.jpg) |
|
### 2-step inference |
|
We note that a small CFG can be used to enhance the image quality. |
|
```python |
|
pipeline = DiffusionPipeline.from_pretrained("stablediffusionapi/realistic-vision-v51", torch_dtype = torch.float16) |
|
pipeline = pipeline.to('cuda') |
|
pipeline.scheduler = LCMScheduler.from_config(pipeline.scheduler.config) |
|
pipeline.load_lora_weights('Luo-Yihong/yoso_sd1.5_lora') |
|
generator = torch.manual_seed(318) |
|
steps = 2 |
|
imgs= pipeline(prompt="A photo of a man, XT3", |
|
num_inference_steps=steps, |
|
num_images_per_prompt = 1, |
|
generator = generator, |
|
guidance_scale=1.5, |
|
)[0] |
|
imgs |
|
``` |
|
![man](man.jpg) |
|
|
|
Moreover, it is observed that when combined with new base models, our YOSO-LoRA is able to use some advanced ode-solvers: |
|
```python |
|
import torch |
|
from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler |
|
pipeline = DiffusionPipeline.from_pretrained("stablediffusionapi/realistic-vision-v51", torch_dtype = torch.float16) |
|
pipeline = pipeline.to('cuda') |
|
pipeline.load_lora_weights('Luo-Yihong/yoso_sd1.5_lora') |
|
pipeline.scheduler = DPMSolverMultistepScheduler.from_pretrained("runwayml/stable-diffusion-v1-5", subfolder="scheduler") |
|
generator = torch.manual_seed(323) |
|
steps = 2 |
|
imgs= pipeline(prompt="A photo of a girl, XT3", |
|
num_inference_steps=steps, |
|
num_images_per_prompt = 1, |
|
generator = generator, |
|
guidance_scale=1.5, |
|
)[0] |
|
imgs[0] |
|
``` |
|
![girl](girl.jpg) |
|
|
|
We encourage you to experiment with various solvers to obtain better samples. We will try to improve the compatibility of the YOSO-LoRA with different solvers. |
|
|
|
You may try some interesting applications, like: |
|
```python |
|
generator = torch.manual_seed(318) |
|
steps = 2 |
|
img_list = [] |
|
for age in [2,20,30,50,60,80]: |
|
imgs = pipeline(prompt=f"A photo of a cute girl, {age} yr old, XT3", |
|
num_inference_steps=steps, |
|
num_images_per_prompt = 1, |
|
generator = generator, |
|
guidance_scale=1.1, |
|
)[0] |
|
img_list.append(imgs[0]) |
|
make_image_grid(img_list,rows=1,cols=len(img_list)) |
|
``` |
|
![life](life.jpg) |
|
|
|
You can increase the steps to improve sample quality. |
|
|
|
## Bibtex |
|
``` |
|
@misc{luo2024sample, |
|
title={You Only Sample Once: Taming One-Step Text-To-Image Synthesis by Self-Cooperative Diffusion GANs}, |
|
author={Yihong Luo and Xiaolong Chen and Jing Tang}, |
|
booktitle={arXiv preprint arxiv:2403.12931}, |
|
year={2024}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CV} |
|
} |
|
``` |