File size: 5,340 Bytes

fb21dd2
 
 
 
454615f
f032160
 
fb21dd2
ddd56e8
410c9af
4552069
ddd56e8
13645f4
 
a1bffa3
 
fc30351
 
13645f4
ddd56e8
 
3f523df
ee34371
6487dd2
bb3264f
 
6487dd2
 
 
7cf8c81
6487dd2
 
 
 
 
b25d8b2
e87f526
b25d8b2
6487dd2
 
 
 
 
 
 
 
 
3f523df
581062e
 
 
 
 
bed2421
581062e
 
6262708
581062e
 
 
 
 
6262708
581062e
00bed5d
3f523df
 
 
ddd56e8
 
 
7cf8c81
ddd56e8
 
 
 
 
 
 
 
 
6109f69
b408920
0c7a11a
5efa73d
 
 
 
 
 
7422ed6
5efa73d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0c7a11a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7ef0e6c
 
 
 
 
4552069
 
 
 
 
 
7ef0e6c

---
language:
- en
pipeline_tag: text-to-image
library_name: diffusers
tags:
- lora
---
# You Only Sample Once (YOSO)
![overview](overview.jpg)
The YOSO was proposed in "[You Only Sample Once: Taming One-Step Text-To-Image Synthesis by Self-Cooperative Diffusion GANs](https://www.arxiv.org/abs/2403.12931)" by *Yihong Luo, Xiaolong Chen, Xinghua Qu, Jing Tang*. 

Official Repository of this paper: [YOSO](https://github.com/Luo-Yihong/YOSO).

## Note
**This is our old-version LoRA**. We have re-trained the YOSO-LoRA via more computational resources and better data, achieving better one-step performance. Check the [technical report](https://www.arxiv.org/abs/2403.12931) for more details! The newly trained LoRA may be released in the next few months.



## Usage

### 1-step inference
1-step inference is only allowed based on SD v1.5 for now. And you should prepare the informative initialization according to the paper for better results.
```python
import torch
from diffusers import DiffusionPipeline, LCMScheduler
pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype = torch.float16)
pipeline = pipeline.to('cuda')
pipeline.scheduler = LCMScheduler.from_config(pipeline.scheduler.config)
pipeline.load_lora_weights('Luo-Yihong/yoso_sd1.5_lora')
generator = torch.manual_seed(318)
steps = 1
bs = 1
latents = ... # maybe some latent codes of real images or SD generation
latent_mean = latent.mean(dim=0)
init_latent = latent_mean.repeat(bs,1,1,1) + latents.std()*torch.randn_like(latents) 
noise = torch.randn([bs,4,64,64])
input_latent = pipeline.scheduler.add_noise(init_latent,noise,T)
imgs= pipeline(prompt="A photo of a dog",
                    num_inference_steps=steps, 
                    num_images_per_prompt = 1,
                        generator = generator,
                        guidance_scale=1.5,
                    latents = input_latent,
                   )[0]
imgs
```

The simple inference without informative initialization, but worse quality:
```python
pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype = torch.float16)
pipeline = pipeline.to('cuda')
pipeline.scheduler = LCMScheduler.from_config(pipeline.scheduler.config)
pipeline.load_lora_weights('Luo-Yihong/yoso_sd1.5_lora')
generator = torch.manual_seed(318)
steps = 1
imgs = pipeline(prompt="A photo of a corgi in forest, highly detailed, 8k, XT3.",
                    num_inference_steps=1, 
                    num_images_per_prompt = 1,
                        generator = generator,
                        guidance_scale=1.,
                   )[0]
imgs[0]
```
![Corgi](corgi.jpg)
### 2-step inference
We note that a small CFG can be used to enhance the image quality.
```python
pipeline = DiffusionPipeline.from_pretrained("stablediffusionapi/realistic-vision-v51", torch_dtype = torch.float16)
pipeline = pipeline.to('cuda')
pipeline.scheduler = LCMScheduler.from_config(pipeline.scheduler.config)
pipeline.load_lora_weights('Luo-Yihong/yoso_sd1.5_lora')
generator = torch.manual_seed(318)
steps = 2
imgs= pipeline(prompt="A photo of a man, XT3",
                    num_inference_steps=steps, 
                    num_images_per_prompt = 1,
                        generator = generator,
                        guidance_scale=1.5,
                   )[0]
imgs
```
![man](man.jpg)

Moreover, it is observed that when combined with new base models, our YOSO-LoRA is able to use some advanced ode-solvers:
```python
import torch
from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler
pipeline = DiffusionPipeline.from_pretrained("stablediffusionapi/realistic-vision-v51", torch_dtype = torch.float16)
pipeline = pipeline.to('cuda')
pipeline.load_lora_weights('Luo-Yihong/yoso_sd1.5_lora')
pipeline.scheduler = DPMSolverMultistepScheduler.from_pretrained("runwayml/stable-diffusion-v1-5", subfolder="scheduler")
generator = torch.manual_seed(323)
steps = 2
imgs= pipeline(prompt="A photo of a girl, XT3",
                    num_inference_steps=steps, 
                    num_images_per_prompt = 1,
                        generator = generator,
                        guidance_scale=1.5,
                   )[0]
imgs[0]
```
![girl](girl.jpg)

We encourage you to experiment with various solvers to obtain better samples. We will try to improve the compatibility of the YOSO-LoRA with different solvers.

You may try some interesting applications, like:
```python
generator = torch.manual_seed(318)
steps = 2
img_list = []
for age in [2,20,30,50,60,80]:
    imgs = pipeline(prompt=f"A photo of a cute girl, {age} yr old, XT3",
                        num_inference_steps=steps, 
                        num_images_per_prompt = 1,
                            generator = generator,
                            guidance_scale=1.1,
                       )[0]
    img_list.append(imgs[0])
make_image_grid(img_list,rows=1,cols=len(img_list))
```
![life](life.jpg)

You can increase the steps to improve sample quality.

## Bibtex
```
@misc{luo2024sample,
      title={You Only Sample Once: Taming One-Step Text-to-Image Synthesis by Self-Cooperative Diffusion GANs}, 
      author={Yihong Luo and Xiaolong Chen and Xinghua Qu and Jing Tang},
      year={2024},
      eprint={2403.12931},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
```