--- language: - en pipeline_tag: text-to-image library_name: diffusers tags: - lora --- # You Only Sample Once (YOSO) ![overview](overview.jpg) The YOSO was proposed in "[You Only Sample Once: Taming One-Step Text-To-Image Synthesis by Self-Cooperative Diffusion GANs](https://www.arxiv.org/abs/2403.12931)" by *Yihong Luo, Xiaolong Chen, Xinghua Qu, Jing Tang*. Official Repository of this paper: [YOSO](https://github.com/Luo-Yihong/YOSO). ## Note **This is our old-version LoRA**. We have re-trained the YOSO-LoRA via more computational resources and better data, achieving better one-step performance. Check the [technical report](https://www.arxiv.org/abs/2403.12931) for more details! The newly trained LoRA may be released in the next few months. ## Usage ### 1-step inference 1-step inference is only allowed based on SD v1.5 for now. And you should prepare the informative initialization according to the paper for better results. ```python import torch from diffusers import DiffusionPipeline, LCMScheduler pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype = torch.float16) pipeline = pipeline.to('cuda') pipeline.scheduler = LCMScheduler.from_config(pipeline.scheduler.config) pipeline.load_lora_weights('Luo-Yihong/yoso_sd1.5_lora') generator = torch.manual_seed(318) steps = 1 bs = 1 latents = ... # maybe some latent codes of real images or SD generation latent_mean = latent.mean(dim=0) init_latent = latent_mean.repeat(bs,1,1,1) + latents.std()*torch.randn_like(latents) noise = torch.randn([bs,4,64,64]) input_latent = pipeline.scheduler.add_noise(init_latent,noise,T) imgs= pipeline(prompt="A photo of a dog", num_inference_steps=steps, num_images_per_prompt = 1, generator = generator, guidance_scale=1.5, latents = input_latent, )[0] imgs ``` The simple inference without informative initialization, but worse quality: ```python pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype = torch.float16) pipeline = pipeline.to('cuda') pipeline.scheduler = LCMScheduler.from_config(pipeline.scheduler.config) pipeline.load_lora_weights('Luo-Yihong/yoso_sd1.5_lora') generator = torch.manual_seed(318) steps = 1 imgs = pipeline(prompt="A photo of a corgi in forest, highly detailed, 8k, XT3.", num_inference_steps=1, num_images_per_prompt = 1, generator = generator, guidance_scale=1., )[0] imgs[0] ``` ![Corgi](corgi.jpg) ### 2-step inference We note that a small CFG can be used to enhance the image quality. ```python pipeline = DiffusionPipeline.from_pretrained("stablediffusionapi/realistic-vision-v51", torch_dtype = torch.float16) pipeline = pipeline.to('cuda') pipeline.scheduler = LCMScheduler.from_config(pipeline.scheduler.config) pipeline.load_lora_weights('Luo-Yihong/yoso_sd1.5_lora') generator = torch.manual_seed(318) steps = 2 imgs= pipeline(prompt="A photo of a man, XT3", num_inference_steps=steps, num_images_per_prompt = 1, generator = generator, guidance_scale=1.5, )[0] imgs ``` ![man](man.jpg) Moreover, it is observed that when combined with new base models, our YOSO-LoRA is able to use some advanced ode-solvers: ```python import torch from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler pipeline = DiffusionPipeline.from_pretrained("stablediffusionapi/realistic-vision-v51", torch_dtype = torch.float16) pipeline = pipeline.to('cuda') pipeline.load_lora_weights('Luo-Yihong/yoso_sd1.5_lora') pipeline.scheduler = DPMSolverMultistepScheduler.from_pretrained("runwayml/stable-diffusion-v1-5", subfolder="scheduler") generator = torch.manual_seed(323) steps = 2 imgs= pipeline(prompt="A photo of a girl, XT3", num_inference_steps=steps, num_images_per_prompt = 1, generator = generator, guidance_scale=1.5, )[0] imgs[0] ``` ![girl](girl.jpg) We encourage you to experiment with various solvers to obtain better samples. We will try to improve the compatibility of the YOSO-LoRA with different solvers. You may try some interesting applications, like: ```python generator = torch.manual_seed(318) steps = 2 img_list = [] for age in [2,20,30,50,60,80]: imgs = pipeline(prompt=f"A photo of a cute girl, {age} yr old, XT3", num_inference_steps=steps, num_images_per_prompt = 1, generator = generator, guidance_scale=1.1, )[0] img_list.append(imgs[0]) make_image_grid(img_list,rows=1,cols=len(img_list)) ``` ![life](life.jpg) You can increase the steps to improve sample quality. ## Bibtex ``` @misc{luo2024sample, title={You Only Sample Once: Taming One-Step Text-to-Image Synthesis by Self-Cooperative Diffusion GANs}, author={Yihong Luo and Xiaolong Chen and Xinghua Qu and Jing Tang}, year={2024}, eprint={2403.12931}, archivePrefix={arXiv}, primaryClass={cs.CV} } ```