File size: 3,418 Bytes
fb21dd2 ddd56e8 410c9af 51100f3 ddd56e8 3f523df ee34371 6487dd2 bb3264f 6487dd2 7cf8c81 6487dd2 4dff679 6487dd2 3f523df 581062e bed2421 581062e 6262708 581062e 6262708 581062e 00bed5d 3f523df ddd56e8 7cf8c81 ddd56e8 6109f69 b408920 0c7a11a b408920 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 |
---
language:
- en
pipeline_tag: text-to-image
---
# You Only Sample Once (YOSO)
![overview](overview.jpg)
The YOSO was proposed in You Only Sample Once: Taming One-Step Text-To-Image Synthesis by Self-Cooperative Diffusion GANs by *Yihong Luo, Xiaolong Chen, Jing Tang*.
## Usage
### 1-step inference
1-step inference is only allowed based on SD v1.5 for now. And you should prepare the informative initialization according to the paper for better results.
```python
import torch
from diffusers import DiffusionPipeline, LCMScheduler
pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype = torch.float16)
pipeline = pipeline.to('cuda')
pipeline.scheduler = LCMScheduler.from_config(pipeline.scheduler.config)
pipeline.load_lora_weights('Luo-Yihong/yoso_sd1.5_lora')
generator = torch.manual_seed(318)
steps = 1
bs = 1
latents = ... # maybe some latent codes of real images or SD generation
latent_mean = latent.mean(dim=0)
noise = torch.randn([1,bs,64,64])
input_latent = pipeline.scheduler.add_noise(latent_mean.repeat(bs,1,1,1),noise,T)
imgs= pipeline(prompt="A photo of a dog",
num_inference_steps=steps,
num_images_per_prompt = 1,
generator = generator,
guidance_scale=1.5,
latents = input_latent,
)[0]
imgs
```
The simple inference without informative initialization, but worse quality:
```python
pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype = torch.float16)
pipeline = pipeline.to('cuda')
pipeline.scheduler = LCMScheduler.from_config(pipeline.scheduler.config)
pipeline.load_lora_weights('Luo-Yihong/yoso_sd1.5_lora')
generator = torch.manual_seed(318)
steps = 1
imgs = pipeline(prompt="A photo of a corgi in forest, highly detailed, 8k, XT3.",
num_inference_steps=1,
num_images_per_prompt = 1,
generator = generator,
guidance_scale=1.,
)[0]
imgs[0]
```
![Corgi](corgi.jpg)
### 2-step inference
We note that a small CFG can be used to enhance the image quality.
```python
pipeline = DiffusionPipeline.from_pretrained("stablediffusionapi/realistic-vision-v51", torch_dtype = torch.float16)
pipeline = pipeline.to('cuda')
pipeline.scheduler = LCMScheduler.from_config(pipeline.scheduler.config)
pipeline.load_lora_weights('Luo-Yihong/yoso_sd1.5_lora')
generator = torch.manual_seed(318)
steps = 2
imgs= pipeline(prompt="A photo of a man, XT3",
num_inference_steps=steps,
num_images_per_prompt = 1,
generator = generator,
guidance_scale=1.5,
)[0]
imgs
```
![man](man.jpg)
You may try some interesting applications, like:
```python
generator = torch.manual_seed(318)
steps = 2
img_list = []
for age in [2,20,30,50,60,80]:
imgs = pipeline(prompt=f"A photo of a cute girl, {age} yr old, XT3",
num_inference_steps=steps,
num_images_per_prompt = 1,
generator = generator,
guidance_scale=1.1,
)[0]
img_list.append(imgs[0])
make_image_grid(img_list,rows=1,cols=len(img_list))
```
![life](life.jpg)
You can increase the steps to improve sample quality. |