Spaces:

liangsu9988
/

test_SD1.5

Runtime error

App Files Files Community

test_SD1.5 / diffusers /docs /source /en /training /lora.mdx

liangsu9988

Upload 505 files

3e88ee7 over 2 years ago

raw

history blame contribute delete

8.76 kB

	<!--Copyright 2023 The HuggingFace Team. All rights reserved.

	Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
	an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
	specific language governing permissions and limitations under the License.
	-->

	# LoRA Support in Diffusers

	Diffusers supports LoRA for faster fine-tuning of Stable Diffusion, allowing greater memory efficiency and easier portability.

	Low-Rank Adaption of Large Language Models was first introduced by Microsoft in
	[LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685) by Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen.

	In a nutshell, LoRA allows adapting pretrained models by adding pairs of rank-decomposition weight matrices (called update matrices)
	to existing weights and only training those newly added weights. This has a couple of advantages:

	- Previous pretrained weights are kept frozen so that the model is not so prone to [catastrophic forgetting](https://www.pnas.org/doi/10.1073/pnas.1611835114).
	- Rank-decomposition matrices have significantly fewer parameters than the original model, which means that trained LoRA weights are easily portable.
	- LoRA matrices are generally added to the attention layers of the original model and they control to which extent the model is adapted toward new training images via a `scale` parameter.

	**__Note that the usage of LoRA is not just limited to attention layers. In the original LoRA work, the authors found out that just amending
	the attention layers of a language model is sufficient to obtain good downstream performance with great efficiency. This is why, it's common
	to just add the LoRA weights to the attention layers of a model.__**

	[cloneofsimo](https://github.com/cloneofsimo) was the first to try out LoRA training for Stable Diffusion in the popular [lora](https://github.com/cloneofsimo/lora) GitHub repository.

	<Tip>

	LoRA allows us to achieve greater memory efficiency since the pretrained weights are kept frozen and only the LoRA weights are trained, thereby
	allowing us to run fine-tuning on consumer GPUs like Tesla T4, RTX 3080 or even RTX 2080 Ti! One can get access to GPUs like T4 in the free
	tiers of Kaggle Kernels and Google Colab Notebooks.

	</Tip>

	## Getting started with LoRA for fine-tuning

	Stable Diffusion can be fine-tuned in different ways:

	* [Textual inversion](https://huggingface.co/docs/diffusers/main/en/training/text_inversion)
	* [DreamBooth](https://huggingface.co/docs/diffusers/main/en/training/dreambooth)
	* [Text2Image fine-tuning](https://huggingface.co/docs/diffusers/main/en/training/text2image)

	We provide two end-to-end examples that show how to run fine-tuning with LoRA:

	* [DreamBooth](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth#training-with-low-rank-adaptation-of-large-language-models-lora)
	* [Text2Image](https://github.com/huggingface/diffusers/tree/main/examples/text_to_image#training-with-lora)

	If you want to perform DreamBooth training with LoRA, for instance, you would run:

	```bash
	export MODEL_NAME="runwayml/stable-diffusion-v1-5"
	export INSTANCE_DIR="path-to-instance-images"
	export OUTPUT_DIR="path-to-save-model"

	accelerate launch train_dreambooth_lora.py \
	--pretrained_model_name_or_path=$MODEL_NAME \
	--instance_data_dir=$INSTANCE_DIR \
	--output_dir=$OUTPUT_DIR \
	--instance_prompt="a photo of sks dog" \
	--resolution=512 \
	--train_batch_size=1 \
	--gradient_accumulation_steps=1 \
	--checkpointing_steps=100 \
	--learning_rate=1e-4 \
	--report_to="wandb" \
	--lr_scheduler="constant" \
	--lr_warmup_steps=0 \
	--max_train_steps=500 \
	--validation_prompt="A photo of sks dog in a bucket" \
	--validation_epochs=50 \
	--seed="0" \
	--push_to_hub
	```

	A similar process can be followed to fully fine-tune Stable Diffusion on a custom dataset using the
	`examples/text_to_image/train_text_to_image_lora.py` script.

	Refer to the respective examples linked above to learn more.

	<Tip>

	When using LoRA we can use a much higher learning rate (typically 1e-4 as opposed to ~1e-6) compared to non-LoRA Dreambooth fine-tuning.

	</Tip>

	But there is no free lunch. For the given dataset and expected generation quality, you'd still need to experiment with
	different hyperparameters. Here are some important ones:

	* Training time
	* Learning rate
	* Number of training steps
	* Inference time
	* Number of steps
	* Scheduler type

	Additionally, you can follow [this blog](https://huggingface.co/blog/dreambooth) that documents some of our experimental
	findings for performing DreamBooth training of Stable Diffusion.

	When fine-tuning, the LoRA update matrices are only added to the attention layers. To enable this, we added new weight
	loading functionalities. Their details are available [here](https://huggingface.co/docs/diffusers/main/en/api/loaders).

	## Inference

	Assuming you used the `examples/text_to_image/train_text_to_image_lora.py` to fine-tune Stable Diffusion on the [Pokemon
	dataset](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions), you can perform inference like so:

	```py
	from diffusers import StableDiffusionPipeline
	import torch

	model_path = "sayakpaul/sd-model-finetuned-lora-t4"
	pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16)
	pipe.unet.load_attn_procs(model_path)
	pipe.to("cuda")

	prompt = "A pokemon with blue eyes."
	image = pipe(prompt, num_inference_steps=30, guidance_scale=7.5).images[0]
	image.save("pokemon.png")
	```

	Here are some example images you can expect:

	<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pokemon-collage.png"/>

	[`sayakpaul/sd-model-finetuned-lora-t4`](https://huggingface.co/sayakpaul/sd-model-finetuned-lora-t4) contains [LoRA fine-tuned update matrices](https://huggingface.co/sayakpaul/sd-model-finetuned-lora-t4/blob/main/pytorch_lora_weights.bin)
	which is only 3 MBs in size. During inference, the pre-trained Stable Diffusion checkpoints are loaded alongside these update
	matrices and then they are combined to run inference.

	You can use the [`huggingface_hub`](https://github.com/huggingface/huggingface_hub) library to retrieve the base model
	from [`sayakpaul/sd-model-finetuned-lora-t4`](https://huggingface.co/sayakpaul/sd-model-finetuned-lora-t4) like so:

	```py
	from huggingface_hub.repocard import RepoCard

	card = RepoCard.load("sayakpaul/sd-model-finetuned-lora-t4")
	base_model = card.data.to_dict()["base_model"]
	# 'CompVis/stable-diffusion-v1-4'
	```

	And then you can use `pipe = StableDiffusionPipeline.from_pretrained(base_model, torch_dtype=torch.float16)`.

	This is especially useful when you don't want to hardcode the base model identifier during initializing the `StableDiffusionPipeline`.

	Inference for DreamBooth training remains the same. Check
	[this section](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth#inference-1) for more details.

	### Merging LoRA with original model

	When performing inference, you can merge the trained LoRA weights with the frozen pre-trained model weights, to interpolate between the original model's inference result (as if no fine-tuning had occurred) and the fully fine-tuned version.

	You can adjust the merging ratio with a parameter called α (alpha) in the paper, or `scale` in our implementation. You can tweak it with the following code, that passes `scale` as `cross_attention_kwargs` in the pipeline call:

	```py
	from diffusers import StableDiffusionPipeline
	import torch

	model_path = "sayakpaul/sd-model-finetuned-lora-t4"
	pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16)
	pipe.unet.load_attn_procs(model_path)
	pipe.to("cuda")

	prompt = "A pokemon with blue eyes."
	image = pipe(prompt, num_inference_steps=30, guidance_scale=7.5, cross_attention_kwargs={"scale": 0.5}).images[0]
	image.save("pokemon.png")
	```

	A value of `0` is the same as _not_ using the LoRA weights, whereas `1` means only the LoRA fine-tuned weights will be used. Values between 0 and 1 will interpolate between the two versions.


	## Known limitations

	* Currently, we only support LoRA for the attention layers of [`UNet2DConditionModel`](https://huggingface.co/docs/diffusers/main/en/api/models#diffusers.UNet2DConditionModel).