metadata
language:
- en
pipeline_tag: text-to-image
You Only Sample Once (YOSO)
The YOSO was proposed in You Only Sample Once: Taming One-Step Text-To-Image Synthesis by Self-Cooperative Diffusion GANs by Yihong Luo, Xiaolong Chen, Jing Tang.
Usage
1-step inference
1-step inference is only allowed based on SD v1.5 for now. And you should prepare the informative initialization according to the paper for better results.
import torch
from diffusers import DiffusionPipeline, LCMScheduler
pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype = torch.float16)
pipeline = pipeline.to('cuda')
pipeline.scheduler = LCMScheduler.from_config(pipeline.scheduler.config)
pipeline.load_lora_weights('Luo-Yihong/yoso_sd1.5_lora')
generator = torch.manual_seed(318)
steps = 1
bs = 1
latents = ... # maybe some latent codes of real images or SD generation
latent_mean = latent.mean(dim=0)
noise = torch.randn([1,bs,64,64])
input_latent = pipeline.scheduler.add_noise(latent_mean.repeat(bs,1,1,1),noise,T)
imgs= pipeline(prompt="A photo of a dog",
num_inference_steps=steps,
num_images_per_prompt = 1,
generator = generator,
guidance_scale=1.5,
latents = input_latent,
)[0]
imgs
The simple inference without informative initialization, but worse quality:
pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype = torch.float16)
pipeline = pipeline.to('cuda')
pipeline.scheduler = LCMScheduler.from_config(pipeline.scheduler.config)
pipeline.load_lora_weights('Luo-Yihong/yoso_sd1.5_lora')
generator = torch.manual_seed(318)
steps = 1
imgs = pipeline(prompt="A photo of a corgi in forest, highly detailed, 8k, XT3.",
num_inference_steps=1,
num_images_per_prompt = 1,
generator = generator,
guidance_scale=1.,
)[0]
imgs[0]
2-step inference
We note that a small CFG can be used to enhance the image quality.
pipeline = DiffusionPipeline.from_pretrained("stablediffusionapi/realistic-vision-v51", torch_dtype = torch.float16)
pipeline = pipeline.to('cuda')
pipeline.scheduler = LCMScheduler.from_config(pipeline.scheduler.config)
pipeline.load_lora_weights('Luo-Yihong/yoso_sd1.5_lora')
generator = torch.manual_seed(318)
steps = 2
imgs= pipeline(prompt="A photo of a man, XT3",
num_inference_steps=steps,
num_images_per_prompt = 1,
generator = generator,
guidance_scale=1.5,
)[0]
imgs
You may try some interesting applications, like:
generator = torch.manual_seed(318)
steps = 2
img_list = []
for age in [2,20,30,50,60,80]:
imgs = pipeline(prompt=f"A photo of a cute girl, {age} yr old, XT3",
num_inference_steps=steps,
num_images_per_prompt = 1,
generator = generator,
guidance_scale=1.1,
)[0]
img_list.append(imgs[0])
make_image_grid(img_list,rows=1,cols=len(img_list))
You can increase the steps to improve sample quality.