Luo-Yihong
/

yoso_sd1.5_lora

Model card Files Files and versions Community

yoso_sd1.5_lora / README.md

Luo-Yihong's picture

Update README.md

bed2421 verified 10 months ago

|

3.3 kB

	# You Only Sample Once (YOSO)
	![overview](overview.jpg)
	The YOSO was proposed in You Only Sample Once: Taming One-Step Text-To-Image Synthesis by Self-Cooperative Diffusion GANs by Yihong Luo, Xiaolong Chen, Jing Tang.

	## Usage

	### 1-step inference
	1-step inference is only allowed based on SD v1.5 for now. And you should prepare the informative initialization according to the paper for better results.
	```python
	pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype = torch.float16)
	pipeline = pipeline.to('cuda')
	pipeline.scheduler = LCMScheduler.from_config(pipeline.scheduler.config)
	pipeline.load_lora_weights('Luo-Yihong/yoso_sd1.5_lora')
	generator = torch.manual_seed(318)
	steps = 1
	bs = 1
	latents = ... # maybe some latent codes of real images or SD generation
	latent_mean = latent.mean(dim=0)
	noise = torch.randn([1,bs,64,64])
	input_latent = pipeline.scheduler.add_noise(latent_mean.repeat(bs,1,1,1),noise,T)
	imgs= pipeline(prompt="A photo of a dog",
	num_inference_steps=steps,
	num_images_per_prompt = 1,
	generator = generator,
	guidance_scale=1.5,
	latents = input_latent,
	)[0]
	imgs
	```

	The simple inference without informative initialization, but worse quality:
	```python
	pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype = torch.float16)
	pipeline = pipeline.to('cuda')
	pipeline.scheduler = LCMScheduler.from_config(pipeline.scheduler.config)
	pipeline.load_lora_weights('Luo-Yihong/yoso_sd1.5_lora')
	generator = torch.manual_seed(318)
	steps = 1
	imgs = pipeline(prompt="A photo of a corgi in forest, highly detailed, 8k, XT3.",
	num_inference_steps=1,
	num_images_per_prompt = 1,
	generator = generator,
	guidance_scale=1.,
	)[0]
	imgs[0]
	```
	![Corgi](corgi.jpg)
	### 2-step inference
	We note that a small CFG can be used to enhance the image quality.
	```python
	pipeline = DiffusionPipeline.from_pretrained("stablediffusionapi/realistic-vision-v51", torch_dtype = torch.float16)
	pipeline = pipeline.to('cuda')
	pipeline.scheduler = LCMScheduler.from_config(pipeline.scheduler.config)
	pipeline.load_lora_weights('Luo-Yihong/yoso_sd1.5_lora')
	generator = torch.manual_seed(318)
	steps = 2
	imgs= pipeline(prompt="A photo of a man, XT3",
	num_inference_steps=steps,
	num_images_per_prompt = 1,
	generator = generator,
	guidance_scale=1.5,
	)[0]
	imgs
	```
	![man](man.jpg)

	You may try some interesting applications, like:
	```python
	generator = torch.manual_seed(318)
	steps = 2
	img_list = []
	for age in [2,20,30,50,60,80]:
	imgs = pipeline(prompt=f"A photo of a cute girl, {age} yr old, XT3",
	num_inference_steps=steps,
	num_images_per_prompt = 1,
	generator = generator,
	guidance_scale=1.1,
	)[0]
	img_list.append(imgs[0])
	make_image_grid(img_list,rows=1,cols=len(img_list))
	```
	![life](life.jpg)

	You can increase the steps to improve sample quality.