File size: 3,881 Bytes
fb21dd2 454615f fb21dd2 ddd56e8 410c9af d175329 ddd56e8 13645f4 ddd56e8 3f523df ee34371 6487dd2 bb3264f 6487dd2 7cf8c81 6487dd2 4dff679 6487dd2 3f523df 581062e bed2421 581062e 6262708 581062e 6262708 581062e 00bed5d 3f523df ddd56e8 7cf8c81 ddd56e8 6109f69 b408920 0c7a11a 7ef0e6c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
---
language:
- en
pipeline_tag: text-to-image
library_name: diffusers
---
# You Only Sample Once (YOSO)
![overview](overview.jpg)
The YOSO was proposed in "[You Only Sample Once: Taming One-Step Text-To-Image Synthesis by Self-Cooperative Diffusion GANs](https://www.arxiv.org/abs/2403.12931)" by *Yihong Luo, Xiaolong Chen, Jing Tang*.
Official Repository of this paper: [YOSO](https://github.com/Luo-Yihong/YOSO).
## Usage
### 1-step inference
1-step inference is only allowed based on SD v1.5 for now. And you should prepare the informative initialization according to the paper for better results.
```python
import torch
from diffusers import DiffusionPipeline, LCMScheduler
pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype = torch.float16)
pipeline = pipeline.to('cuda')
pipeline.scheduler = LCMScheduler.from_config(pipeline.scheduler.config)
pipeline.load_lora_weights('Luo-Yihong/yoso_sd1.5_lora')
generator = torch.manual_seed(318)
steps = 1
bs = 1
latents = ... # maybe some latent codes of real images or SD generation
latent_mean = latent.mean(dim=0)
noise = torch.randn([1,bs,64,64])
input_latent = pipeline.scheduler.add_noise(latent_mean.repeat(bs,1,1,1),noise,T)
imgs= pipeline(prompt="A photo of a dog",
num_inference_steps=steps,
num_images_per_prompt = 1,
generator = generator,
guidance_scale=1.5,
latents = input_latent,
)[0]
imgs
```
The simple inference without informative initialization, but worse quality:
```python
pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype = torch.float16)
pipeline = pipeline.to('cuda')
pipeline.scheduler = LCMScheduler.from_config(pipeline.scheduler.config)
pipeline.load_lora_weights('Luo-Yihong/yoso_sd1.5_lora')
generator = torch.manual_seed(318)
steps = 1
imgs = pipeline(prompt="A photo of a corgi in forest, highly detailed, 8k, XT3.",
num_inference_steps=1,
num_images_per_prompt = 1,
generator = generator,
guidance_scale=1.,
)[0]
imgs[0]
```
![Corgi](corgi.jpg)
### 2-step inference
We note that a small CFG can be used to enhance the image quality.
```python
pipeline = DiffusionPipeline.from_pretrained("stablediffusionapi/realistic-vision-v51", torch_dtype = torch.float16)
pipeline = pipeline.to('cuda')
pipeline.scheduler = LCMScheduler.from_config(pipeline.scheduler.config)
pipeline.load_lora_weights('Luo-Yihong/yoso_sd1.5_lora')
generator = torch.manual_seed(318)
steps = 2
imgs= pipeline(prompt="A photo of a man, XT3",
num_inference_steps=steps,
num_images_per_prompt = 1,
generator = generator,
guidance_scale=1.5,
)[0]
imgs
```
![man](man.jpg)
You may try some interesting applications, like:
```python
generator = torch.manual_seed(318)
steps = 2
img_list = []
for age in [2,20,30,50,60,80]:
imgs = pipeline(prompt=f"A photo of a cute girl, {age} yr old, XT3",
num_inference_steps=steps,
num_images_per_prompt = 1,
generator = generator,
guidance_scale=1.1,
)[0]
img_list.append(imgs[0])
make_image_grid(img_list,rows=1,cols=len(img_list))
```
![life](life.jpg)
You can increase the steps to improve sample quality.
## Bibtex
```
@misc{luo2024sample,
title={You Only Sample Once: Taming One-Step Text-To-Image Synthesis by Self-Cooperative Diffusion GANs},
author={Yihong Luo and Xiaolong Chen and Jing Tang},
booktitle={arXiv preprint arxiv:2403.12931},
year={2024},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
``` |