yoso_sd1.5_lora / README.md

Update README.md

a1bffa3 verified 2 months ago

5.34 kB

	---
	language:
	- en
	pipeline_tag: text-to-image
	library_name: diffusers
	tags:
	- lora
	---
	# You Only Sample Once (YOSO)
	![overview](overview.jpg)
	The YOSO was proposed in "[You Only Sample Once: Taming One-Step Text-To-Image Synthesis by Self-Cooperative Diffusion GANs](https://www.arxiv.org/abs/2403.12931)" by Yihong Luo, Xiaolong Chen, Xinghua Qu, Jing Tang.

	Official Repository of this paper: [YOSO](https://github.com/Luo-Yihong/YOSO).

	## Note
	This is our old-version LoRA. We have re-trained the YOSO-LoRA via more computational resources and better data, achieving better one-step performance. Check the [technical report](https://www.arxiv.org/abs/2403.12931) for more details! The newly trained LoRA may be released in the next few months.



	## Usage

	### 1-step inference
	1-step inference is only allowed based on SD v1.5 for now. And you should prepare the informative initialization according to the paper for better results.
	```python
	import torch
	from diffusers import DiffusionPipeline, LCMScheduler
	pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype = torch.float16)
	pipeline = pipeline.to('cuda')
	pipeline.scheduler = LCMScheduler.from_config(pipeline.scheduler.config)
	pipeline.load_lora_weights('Luo-Yihong/yoso_sd1.5_lora')
	generator = torch.manual_seed(318)
	steps = 1
	bs = 1
	latents = ... # maybe some latent codes of real images or SD generation
	latent_mean = latent.mean(dim=0)
	init_latent = latent_mean.repeat(bs,1,1,1) + latents.std()*torch.randn_like(latents)
	noise = torch.randn([bs,4,64,64])
	input_latent = pipeline.scheduler.add_noise(init_latent,noise,T)
	imgs= pipeline(prompt="A photo of a dog",
	num_inference_steps=steps,
	num_images_per_prompt = 1,
	generator = generator,
	guidance_scale=1.5,
	latents = input_latent,
	)[0]
	imgs
	```

	The simple inference without informative initialization, but worse quality:
	```python
	pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype = torch.float16)
	pipeline = pipeline.to('cuda')
	pipeline.scheduler = LCMScheduler.from_config(pipeline.scheduler.config)
	pipeline.load_lora_weights('Luo-Yihong/yoso_sd1.5_lora')
	generator = torch.manual_seed(318)
	steps = 1
	imgs = pipeline(prompt="A photo of a corgi in forest, highly detailed, 8k, XT3.",
	num_inference_steps=1,
	num_images_per_prompt = 1,
	generator = generator,
	guidance_scale=1.,
	)[0]
	imgs[0]
	```
	![Corgi](corgi.jpg)
	### 2-step inference
	We note that a small CFG can be used to enhance the image quality.
	```python
	pipeline = DiffusionPipeline.from_pretrained("stablediffusionapi/realistic-vision-v51", torch_dtype = torch.float16)
	pipeline = pipeline.to('cuda')
	pipeline.scheduler = LCMScheduler.from_config(pipeline.scheduler.config)
	pipeline.load_lora_weights('Luo-Yihong/yoso_sd1.5_lora')
	generator = torch.manual_seed(318)
	steps = 2
	imgs= pipeline(prompt="A photo of a man, XT3",
	num_inference_steps=steps,
	num_images_per_prompt = 1,
	generator = generator,
	guidance_scale=1.5,
	)[0]
	imgs
	```
	![man](man.jpg)

	Moreover, it is observed that when combined with new base models, our YOSO-LoRA is able to use some advanced ode-solvers:
	```python
	import torch
	from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler
	pipeline = DiffusionPipeline.from_pretrained("stablediffusionapi/realistic-vision-v51", torch_dtype = torch.float16)
	pipeline = pipeline.to('cuda')
	pipeline.load_lora_weights('Luo-Yihong/yoso_sd1.5_lora')
	pipeline.scheduler = DPMSolverMultistepScheduler.from_pretrained("runwayml/stable-diffusion-v1-5", subfolder="scheduler")
	generator = torch.manual_seed(323)
	steps = 2
	imgs= pipeline(prompt="A photo of a girl, XT3",
	num_inference_steps=steps,
	num_images_per_prompt = 1,
	generator = generator,
	guidance_scale=1.5,
	)[0]
	imgs[0]
	```
	![girl](girl.jpg)

	We encourage you to experiment with various solvers to obtain better samples. We will try to improve the compatibility of the YOSO-LoRA with different solvers.

	You may try some interesting applications, like:
	```python
	generator = torch.manual_seed(318)
	steps = 2
	img_list = []
	for age in [2,20,30,50,60,80]:
	imgs = pipeline(prompt=f"A photo of a cute girl, {age} yr old, XT3",
	num_inference_steps=steps,
	num_images_per_prompt = 1,
	generator = generator,
	guidance_scale=1.1,
	)[0]
	img_list.append(imgs[0])
	make_image_grid(img_list,rows=1,cols=len(img_list))
	```
	![life](life.jpg)

	You can increase the steps to improve sample quality.

	## Bibtex
	```
	@misc{luo2024sample,
	title={You Only Sample Once: Taming One-Step Text-to-Image Synthesis by Self-Cooperative Diffusion GANs},
	author={Yihong Luo and Xiaolong Chen and Xinghua Qu and Jing Tang},
	year={2024},
	eprint={2403.12931},
	archivePrefix={arXiv},
	primaryClass={cs.CV}
	}
	```