toshas's picture
initial commit
a45988a

A newer version of the Gradio SDK is available: 5.23.3

Upgrade

์–ด๋Œ‘ํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ

[[open-in-colab]]

ํŠน์ • ๋ฌผ์ฒด์˜ ์ด๋ฏธ์ง€ ๋˜๋Š” ํŠน์ • ์Šคํƒ€์ผ์˜ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๋„๋ก diffusion ๋ชจ๋ธ์„ ๊ฐœ์ธํ™”ํ•˜๊ธฐ ์œ„ํ•œ ๋ช‡ ๊ฐ€์ง€ ํ•™์Šต ๊ธฐ๋ฒ•์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ํ•™์Šต ๋ฐฉ๋ฒ•์€ ๊ฐ๊ฐ ๋‹ค๋ฅธ ์œ ํ˜•์˜ ์–ด๋Œ‘ํ„ฐ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ผ๋ถ€ ์–ด๋Œ‘ํ„ฐ๋Š” ์™„์ „ํžˆ ์ƒˆ๋กœ์šด ๋ชจ๋ธ์„ ์ƒ์„ฑํ•˜๋Š” ๋ฐ˜๋ฉด, ๋‹ค๋ฅธ ์–ด๋Œ‘ํ„ฐ๋Š” ์ž„๋ฒ ๋”ฉ ๋˜๋Š” ๊ฐ€์ค‘์น˜์˜ ์ž‘์€ ๋ถ€๋ถ„๋งŒ ์ˆ˜์ •ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๊ฐ ์–ด๋Œ‘ํ„ฐ์˜ ๋กœ๋”ฉ ํ”„๋กœ์„ธ์Šค๋„ ๋‹ค๋ฅด๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

์ด ๊ฐ€์ด๋“œ์—์„œ๋Š” DreamBooth, textual inversion ๋ฐ LoRA ๊ฐ€์ค‘์น˜๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๋Š” ๋ฐฉ๋ฒ•์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

์‚ฌ์šฉํ•  ์ฒดํฌํฌ์ธํŠธ์™€ ์ž„๋ฒ ๋”ฉ์€ Stable Diffusion Conceptualizer, LoRA the Explorer, Diffusers Models Gallery์—์„œ ์ฐพ์•„๋ณด์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.

DreamBooth

DreamBooth๋Š” ๋ฌผ์ฒด์˜ ์—ฌ๋Ÿฌ ์ด๋ฏธ์ง€์— ๋Œ€ํ•œ diffusion ๋ชจ๋ธ ์ „์ฒด๋ฅผ ๋ฏธ์„ธ ์กฐ์ •ํ•˜์—ฌ ์ƒˆ๋กœ์šด ์Šคํƒ€์ผ๊ณผ ์„ค์ •์œผ๋กœ ํ•ด๋‹น ๋ฌผ์ฒด์˜ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ๋ชจ๋ธ์ด ๋ฌผ์ฒด ์ด๋ฏธ์ง€์™€ ์—ฐ๊ด€์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•์„ ํ•™์Šตํ•˜๋Š” ํ”„๋กฌํ”„ํŠธ์— ํŠน์ˆ˜ ๋‹จ์–ด๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋“  ํ•™์Šต ๋ฐฉ๋ฒ• ์ค‘์—์„œ ๋“œ๋ฆผ๋ถ€์Šค๋Š” ์ „์ฒด ์ฒดํฌํฌ์ธํŠธ ๋ชจ๋ธ์ด๊ธฐ ๋•Œ๋ฌธ์— ํŒŒ์ผ ํฌ๊ธฐ๊ฐ€ ๊ฐ€์žฅ ํฝ๋‹ˆ๋‹ค(๋ณดํ†ต ๋ช‡ GB).

Hergรฉ๊ฐ€ ๊ทธ๋ฆฐ ๋‹จ 10๊ฐœ์˜ ์ด๋ฏธ์ง€๋กœ ํ•™์Šต๋œ herge_style ์ฒดํฌํฌ์ธํŠธ๋ฅผ ๋ถˆ๋Ÿฌ์™€ ํ•ด๋‹น ์Šคํƒ€์ผ์˜ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์ด ์ž‘๋™ํ•˜๋ ค๋ฉด ์ฒดํฌํฌ์ธํŠธ๋ฅผ ํŠธ๋ฆฌ๊ฑฐํ•˜๋Š” ํ”„๋กฌํ”„ํŠธ์— ํŠน์ˆ˜ ๋‹จ์–ด herge_style์„ ํฌํ•จ์‹œ์ผœ์•ผ ํ•ฉ๋‹ˆ๋‹ค:

from diffusers import AutoPipelineForText2Image
import torch

pipeline = AutoPipelineForText2Image.from_pretrained("sd-dreambooth-library/herge-style", torch_dtype=torch.float16).to("cuda")
prompt = "A cute herge_style brown bear eating a slice of pizza, stunning color scheme, masterpiece, illustration"
image = pipeline(prompt).images[0]
image

Textual inversion

Textual inversion์€ DreamBooth์™€ ๋งค์šฐ ์œ ์‚ฌํ•˜๋ฉฐ ๋ช‡ ๊ฐœ์˜ ์ด๋ฏธ์ง€๋งŒ์œผ๋กœ ํŠน์ • ๊ฐœ๋…(์Šคํƒ€์ผ, ๊ฐœ์ฒด)์„ ์ƒ์„ฑํ•˜๋Š” diffusion ๋ชจ๋ธ์„ ๊ฐœ์ธํ™”ํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ํ”„๋กฌํ”„ํŠธ์— ํŠน์ • ๋‹จ์–ด๋ฅผ ์ž…๋ ฅํ•˜๋ฉด ํ•ด๋‹น ์ด๋ฏธ์ง€๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ์ƒˆ๋กœ์šด ์ž„๋ฒ ๋”ฉ์„ ํ•™์Šตํ•˜๊ณ  ์ฐพ์•„๋‚ด๋Š” ๋ฐฉ์‹์œผ๋กœ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ diffusion ๋ชจ๋ธ ๊ฐ€์ค‘์น˜๋Š” ๋™์ผํ•˜๊ฒŒ ์œ ์ง€๋˜๊ณ  ํ›ˆ๋ จ ํ”„๋กœ์„ธ์Šค๋Š” ๋น„๊ต์  ์ž‘์€(์ˆ˜ KB) ํŒŒ์ผ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

Textual inversion์€ ์ž„๋ฒ ๋”ฉ์„ ์ƒ์„ฑํ•˜๊ธฐ ๋•Œ๋ฌธ์— DreamBooth์ฒ˜๋Ÿผ ๋‹จ๋…์œผ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์—†์œผ๋ฉฐ ๋˜ ๋‹ค๋ฅธ ๋ชจ๋ธ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

from diffusers import AutoPipelineForText2Image
import torch

pipeline = AutoPipelineForText2Image.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to("cuda")

์ด์ œ [~loaders.TextualInversionLoaderMixin.load_textual_inversion] ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ textual inversion ์ž„๋ฒ ๋”ฉ์„ ๋ถˆ๋Ÿฌ์™€ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. sd-concepts-library/gta5-artwork ์ž„๋ฒ ๋”ฉ์„ ๋ถˆ๋Ÿฌ์™€ ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํŠธ๋ฆฌ๊ฑฐํ•˜๋ ค๋ฉด ํ”„๋กฌํ”„ํŠธ์— ํŠน์ˆ˜ ๋‹จ์–ด <gta5-artwork>๋ฅผ ํฌํ•จ์‹œ์ผœ์•ผ ํ•ฉ๋‹ˆ๋‹ค:

pipeline.load_textual_inversion("sd-concepts-library/gta5-artwork")
prompt = "A cute brown bear eating a slice of pizza, stunning color scheme, masterpiece, illustration, <gta5-artwork> style"
image = pipeline(prompt).images[0]
image

Textual inversion์€ ๋˜ํ•œ ๋ฐ”๋žŒ์งํ•˜์ง€ ์•Š์€ ์‚ฌ๋ฌผ์— ๋Œ€ํ•ด ๋„ค๊ฑฐํ‹ฐ๋ธŒ ์ž„๋ฒ ๋”ฉ์„ ์ƒ์„ฑํ•˜์—ฌ ๋ชจ๋ธ์ด ํ๋ฆฟํ•œ ์ด๋ฏธ์ง€๋‚˜ ์†์˜ ์ถ”๊ฐ€ ์†๊ฐ€๋ฝ๊ณผ ๊ฐ™์€ ๋ฐ”๋žŒ์งํ•˜์ง€ ์•Š์€ ์‚ฌ๋ฌผ์ด ํฌํ•จ๋œ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜์ง€ ๋ชปํ•˜๋„๋ก ํ•™์Šตํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ํ”„๋กฌํ”„ํŠธ๋ฅผ ๋น ๋ฅด๊ฒŒ ๊ฐœ์„ ํ•˜๋Š” ๊ฒƒ์ด ์‰ฌ์šด ๋ฐฉ๋ฒ•์ด ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ์ด์ „๊ณผ ๊ฐ™์ด ์ž„๋ฒ ๋”ฉ์„ [~loaders.TextualInversionLoaderMixin.load_textual_inversion]์œผ๋กœ ๋ถˆ๋Ÿฌ์˜ค์ง€๋งŒ ์ด๋ฒˆ์—๋Š” ๋‘ ๊ฐœ์˜ ๋งค๊ฐœ๋ณ€์ˆ˜๊ฐ€ ๋” ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค:

  • weight_name: ํŒŒ์ผ์ด ํŠน์ • ์ด๋ฆ„์˜ ๐Ÿค— Diffusers ํ˜•์‹์œผ๋กœ ์ €์žฅ๋œ ๊ฒฝ์šฐ์ด๊ฑฐ๋‚˜ ํŒŒ์ผ์ด A1111 ํ˜•์‹์œผ๋กœ ์ €์žฅ๋œ ๊ฒฝ์šฐ, ๋ถˆ๋Ÿฌ์˜ฌ ๊ฐ€์ค‘์น˜ ํŒŒ์ผ์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค.
  • token: ์ž„๋ฒ ๋”ฉ์„ ํŠธ๋ฆฌ๊ฑฐํ•˜๊ธฐ ์œ„ํ•ด ํ”„๋กฌํ”„ํŠธ์—์„œ ์‚ฌ์šฉํ•  ํŠน์ˆ˜ ๋‹จ์–ด๋ฅผ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค.

sayakpaul/EasyNegative-test ์ž„๋ฒ ๋”ฉ์„ ๋ถˆ๋Ÿฌ์™€ ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค:

pipeline.load_textual_inversion(
    "sayakpaul/EasyNegative-test", weight_name="EasyNegative.safetensors", token="EasyNegative"
)

์ด์ œ token์„ ์‚ฌ์šฉํ•ด ๋„ค๊ฑฐํ‹ฐ๋ธŒ ์ž„๋ฒ ๋”ฉ์ด ์žˆ๋Š” ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

prompt = "A cute brown bear eating a slice of pizza, stunning color scheme, masterpiece, illustration, EasyNegative"
negative_prompt = "EasyNegative"

image = pipeline(prompt, negative_prompt=negative_prompt, num_inference_steps=50).images[0]
image

LoRA

Low-Rank Adaptation (LoRA)์€ ์†๋„๊ฐ€ ๋น ๋ฅด๊ณ  ํŒŒ์ผ ํฌ๊ธฐ๊ฐ€ (์ˆ˜๋ฐฑ MB๋กœ) ์ž‘๊ธฐ ๋•Œ๋ฌธ์— ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ํ•™์Šต ๊ธฐ๋ฒ•์ž…๋‹ˆ๋‹ค. ์ด ๊ฐ€์ด๋“œ์˜ ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ, LoRA๋Š” ๋ช‡ ์žฅ์˜ ์ด๋ฏธ์ง€๋งŒ์œผ๋กœ ์ƒˆ๋กœ์šด ์Šคํƒ€์ผ์„ ํ•™์Šตํ•˜๋„๋ก ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” diffusion ๋ชจ๋ธ์— ์ƒˆ๋กœ์šด ๊ฐ€์ค‘์น˜๋ฅผ ์‚ฝ์ž…ํ•œ ๋‹ค์Œ ์ „์ฒด ๋ชจ๋ธ ๋Œ€์‹  ์ƒˆ๋กœ์šด ๊ฐ€์ค‘์น˜๋งŒ ํ•™์Šต์‹œํ‚ค๋Š” ๋ฐฉ์‹์œผ๋กœ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ LoRA๋ฅผ ๋” ๋น ๋ฅด๊ฒŒ ํ•™์Šต์‹œํ‚ค๊ณ  ๋” ์‰ฝ๊ฒŒ ์ €์žฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

LoRA๋Š” ๋‹ค๋ฅธ ํ•™์Šต ๋ฐฉ๋ฒ•๊ณผ ํ•จ๊ป˜ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋งค์šฐ ์ผ๋ฐ˜์ ์ธ ํ•™์Šต ๊ธฐ๋ฒ•์ž…๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, DreamBooth์™€ LoRA๋กœ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ด ์ผ๋ฐ˜์ ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ ์ƒˆ๋กญ๊ณ  ๊ณ ์œ ํ•œ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ์—ฌ๋Ÿฌ ๊ฐœ์˜ LoRA๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๊ณ  ๋ณ‘ํ•ฉํ•˜๋Š” ๊ฒƒ์ด ์ ์  ๋” ์ผ๋ฐ˜ํ™”๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋ณ‘ํ•ฉ์€ ์ด ๋ถˆ๋Ÿฌ์˜ค๊ธฐ ๊ฐ€์ด๋“œ์˜ ๋ฒ”์œ„๋ฅผ ๋ฒ—์–ด๋‚˜๋ฏ€๋กœ ์ž์„ธํ•œ ๋‚ด์šฉ์€ ์‹ฌ์ธต์ ์ธ LoRA ๋ณ‘ํ•ฉ ๊ฐ€์ด๋“œ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

LoRA๋Š” ๋‹ค๋ฅธ ๋ชจ๋ธ๊ณผ ํ•จ๊ป˜ ์‚ฌ์šฉํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค:

from diffusers import AutoPipelineForText2Image
import torch

pipeline = AutoPipelineForText2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda")

๊ทธ๋ฆฌ๊ณ  [~loaders.LoraLoaderMixin.load_lora_weights] ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ostris/super-cereal-sdxl-lora ๊ฐ€์ค‘์น˜๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๊ณ  ๋ฆฌํฌ์ง€ํ† ๋ฆฌ์—์„œ ๊ฐ€์ค‘์น˜ ํŒŒ์ผ๋ช…์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค:

pipeline.load_lora_weights("ostris/super-cereal-sdxl-lora", weight_name="cereal_box_sdxl_v1.safetensors")
prompt = "bears, pizza bites"
image = pipeline(prompt).images[0]
image

[~loaders.LoraLoaderMixin.load_lora_weights] ๋ฉ”์„œ๋“œ๋Š” LoRA ๊ฐ€์ค‘์น˜๋ฅผ UNet๊ณผ ํ…์ŠคํŠธ ์ธ์ฝ”๋”์— ๋ชจ๋‘ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค. ์ด ๋ฉ”์„œ๋“œ๋Š” ํ•ด๋‹น ์ผ€์ด์Šค์—์„œ LoRA๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๋Š” ๋ฐ ์„ ํ˜ธ๋˜๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค:

  • LoRA ๊ฐ€์ค‘์น˜์— UNet ๋ฐ ํ…์ŠคํŠธ ์ธ์ฝ”๋”์— ๋Œ€ํ•œ ๋ณ„๋„์˜ ์‹๋ณ„์ž๊ฐ€ ์—†๋Š” ๊ฒฝ์šฐ
  • LoRA ๊ฐ€์ค‘์น˜์— UNet๊ณผ ํ…์ŠคํŠธ ์ธ์ฝ”๋”์— ๋Œ€ํ•œ ๋ณ„๋„์˜ ์‹๋ณ„์ž๊ฐ€ ์žˆ๋Š” ๊ฒฝ์šฐ

ํ•˜์ง€๋งŒ LoRA ๊ฐ€์ค‘์น˜๋งŒ UNet์— ๋กœ๋“œํ•ด์•ผ ํ•˜๋Š” ๊ฒฝ์šฐ์—๋Š” [~loaders.UNet2DConditionLoadersMixin.load_attn_procs] ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. jbilcke-hf/sdxl-cinematic-1 LoRA๋ฅผ ๋ถˆ๋Ÿฌ์™€ ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค:

from diffusers import AutoPipelineForText2Image
import torch

pipeline = AutoPipelineForText2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda")
pipeline.unet.load_attn_procs("jbilcke-hf/sdxl-cinematic-1", weight_name="pytorch_lora_weights.safetensors")

# ํ”„๋กฌํ”„ํŠธ์—์„œ cnmt๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ LoRA๋ฅผ ํŠธ๋ฆฌ๊ฑฐํ•ฉ๋‹ˆ๋‹ค.
prompt = "A cute cnmt eating a slice of pizza, stunning color scheme, masterpiece, illustration"
image = pipeline(prompt).images[0]
image

LoRA ๊ฐ€์ค‘์น˜๋ฅผ ์–ธ๋กœ๋“œํ•˜๋ ค๋ฉด [~loaders.LoraLoaderMixin.unload_lora_weights] ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ LoRA ๊ฐ€์ค‘์น˜๋ฅผ ์‚ญ์ œํ•˜๊ณ  ๋ชจ๋ธ์„ ์›๋ž˜ ๊ฐ€์ค‘์น˜๋กœ ๋ณต์›ํ•ฉ๋‹ˆ๋‹ค:

pipeline.unload_lora_weights()

LoRA ๊ฐ€์ค‘์น˜ ์Šค์ผ€์ผ ์กฐ์ •ํ•˜๊ธฐ

[~loaders.LoraLoaderMixin.load_lora_weights] ๋ฐ [~loaders.UNet2DConditionLoadersMixin.load_attn_procs] ๋ชจ๋‘ cross_attention_kwargs={"scale": 0.5} ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ „๋‹ฌํ•˜์—ฌ ์–ผ๋งˆ๋‚˜ LoRA ๊ฐ€์ค‘์น˜๋ฅผ ์‚ฌ์šฉํ• ์ง€ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฐ’์ด 0์ด๋ฉด ๊ธฐ๋ณธ ๋ชจ๋ธ ๊ฐ€์ค‘์น˜๋งŒ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ๊ณผ ๊ฐ™๊ณ , ๊ฐ’์ด 1์ด๋ฉด ์™„์ „ํžˆ ๋ฏธ์„ธ ์กฐ์ •๋œ LoRA๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

๋ ˆ์ด์–ด๋‹น ์‚ฌ์šฉ๋˜๋Š” LoRA ๊ฐ€์ค‘์น˜์˜ ์–‘์„ ๋ณด๋‹ค ์„ธ๋ฐ€ํ•˜๊ฒŒ ์ œ์–ดํ•˜๋ ค๋ฉด [~loaders.LoraLoaderMixin.set_adapters]๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ๋ ˆ์ด์–ด์˜ ๊ฐ€์ค‘์น˜๋ฅผ ์–ผ๋งˆ๋งŒํผ ์กฐ์ •ํ• ์ง€ ์ง€์ •ํ•˜๋Š” ๋”•์…”๋„ˆ๋ฆฌ๋ฅผ ์ „๋‹ฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

pipe = ... # ํŒŒ์ดํ”„๋ผ์ธ ์ƒ์„ฑ
pipe.load_lora_weights(..., adapter_name="my_adapter")
scales = {
    "text_encoder": 0.5,
    "text_encoder_2": 0.5,  # ํŒŒ์ดํ”„์— ๋‘ ๋ฒˆ์งธ ํ…์ŠคํŠธ ์ธ์ฝ”๋”๊ฐ€ ์žˆ๋Š” ๊ฒฝ์šฐ์—๋งŒ ์‚ฌ์šฉ ๊ฐ€๋Šฅ
    "unet": {
        "down": 0.9,  # down ๋ถ€๋ถ„์˜ ๋ชจ๋“  ํŠธ๋žœ์Šคํฌ๋จธ๋Š” ์Šค์ผ€์ผ 0.9๋ฅผ ์‚ฌ์šฉ
        # "mid"  # ์ด ์˜ˆ์ œ์—์„œ๋Š” "mid"๊ฐ€ ์ง€์ •๋˜์ง€ ์•Š์•˜์œผ๋ฏ€๋กœ ์ค‘๊ฐ„ ๋ถ€๋ถ„์˜ ๋ชจ๋“  ํŠธ๋žœ์Šคํฌ๋จธ๋Š” ๊ธฐ๋ณธ ์Šค์ผ€์ผ 1.0์„ ์‚ฌ์šฉ
        "up": {
            "block_0": 0.6,  # # up์˜ 0๋ฒˆ์งธ ๋ธ”๋ก์— ์žˆ๋Š” 3๊ฐœ์˜ ํŠธ๋žœ์Šคํฌ๋จธ๋Š” ๋ชจ๋‘ ์Šค์ผ€์ผ 0.6์„ ์‚ฌ์šฉ
            "block_1": [0.4, 0.8, 1.0],  # up์˜ ์ฒซ ๋ฒˆ์งธ ๋ธ”๋ก์— ์žˆ๋Š” 3๊ฐœ์˜ ํŠธ๋žœ์Šคํฌ๋จธ๋Š” ๊ฐ๊ฐ ์Šค์ผ€์ผ 0.4, 0.8, 1.0์„ ์‚ฌ์šฉ
        }
    }
}
pipe.set_adapters("my_adapter", scales)

์ด๋Š” ์—ฌ๋Ÿฌ ์–ด๋Œ‘ํ„ฐ์—์„œ๋„ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค. ๋ฐฉ๋ฒ•์€ ์ด ๊ฐ€์ด๋“œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

ํ˜„์žฌ [~loaders.LoraLoaderMixin.set_adapters]๋Š” ์–ดํ…์…˜ ๊ฐ€์ค‘์น˜์˜ ์Šค์ผ€์ผ๋ง๋งŒ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. LoRA์— ๋‹ค๋ฅธ ๋ถ€๋ถ„(์˜ˆ: resnets or down-/upsamplers)์ด ์žˆ๋Š” ๊ฒฝ์šฐ 1.0์˜ ์Šค์ผ€์ผ์„ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค.

Kohya์™€ TheLastBen

์ปค๋ฎค๋‹ˆํ‹ฐ์—์„œ ์ธ๊ธฐ ์žˆ๋Š” ๋‹ค๋ฅธ LoRA trainer๋กœ๋Š” Kohya์™€ TheLastBen์˜ trainer๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด trainer๋“ค์€ ๐Ÿค— Diffusers๊ฐ€ ํ›ˆ๋ จํ•œ ๊ฒƒ๊ณผ๋Š” ๋‹ค๋ฅธ LoRA ์ฒดํฌํฌ์ธํŠธ๋ฅผ ์ƒ์„ฑํ•˜์ง€๋งŒ, ๊ฐ™์€ ๋ฐฉ์‹์œผ๋กœ ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Kohya LoRA๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ ์œ„ํ•ด, ์˜ˆ์‹œ๋กœ Civitai์—์„œ Blueprintify SD XL 1.0 ์ฒดํฌํฌ์ธํŠธ๋ฅผ ๋‹ค์šด๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค:

!wget https://civitai.com/api/download/models/168776 -O blueprintify-sd-xl-10.safetensors

LoRA ์ฒดํฌํฌ์ธํŠธ๋ฅผ [~loaders.LoraLoaderMixin.load_lora_weights] ๋ฉ”์„œ๋“œ๋กœ ๋ถˆ๋Ÿฌ์˜ค๊ณ  weight_name ํŒŒ๋ผ๋ฏธํ„ฐ์— ํŒŒ์ผ๋ช…์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค:

from diffusers import AutoPipelineForText2Image
import torch

pipeline = AutoPipelineForText2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda")
pipeline.load_lora_weights("path/to/weights", weight_name="blueprintify-sd-xl-10.safetensors")

์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค:

# LoRA๋ฅผ ํŠธ๋ฆฌ๊ฑฐํ•˜๊ธฐ ์œ„ํ•ด bl3uprint๋ฅผ ํ”„๋กฌํ”„ํŠธ์— ์‚ฌ์šฉ
prompt = "bl3uprint, a highly detailed blueprint of the eiffel tower, explaining how to build all parts, many txt, blueprint grid backdrop"
image = pipeline(prompt).images[0]
image

Kohya LoRA๋ฅผ ๐Ÿค— Diffusers์™€ ํ•จ๊ป˜ ์‚ฌ์šฉํ•  ๋•Œ ๋ช‡ ๊ฐ€์ง€ ์ œํ•œ ์‚ฌํ•ญ์ด ์žˆ์Šต๋‹ˆ๋‹ค:

  • ์—ฌ๊ธฐ์— ์„ค๋ช…๋œ ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ์ด์œ ๋กœ ์ธํ•ด ์ด๋ฏธ์ง€๊ฐ€ ComfyUI์™€ ๊ฐ™์€ UI์—์„œ ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€์™€ ๋‹ค๋ฅด๊ฒŒ ๋ณด์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • LyCORIS ์ฒดํฌํฌ์ธํŠธ๊ฐ€ ์™„์ „ํžˆ ์ง€์›๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. [~loaders.LoraLoaderMixin.load_lora_weights] ๋ฉ”์„œ๋“œ๋Š” LoRA ๋ฐ LoCon ๋ชจ๋“ˆ๋กœ LyCORIS ์ฒดํฌํฌ์ธํŠธ๋ฅผ ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์žˆ์ง€๋งŒ, Hada ๋ฐ LoKR์€ ์ง€์›๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

TheLastBen์—์„œ ์ฒดํฌํฌ์ธํŠธ๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๋Š” ๋ฐฉ๋ฒ•์€ ๋งค์šฐ ์œ ์‚ฌํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, TheLastBen/William_Eggleston_Style_SDXL ์ฒดํฌํฌ์ธํŠธ๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๋ ค๋ฉด:

from diffusers import AutoPipelineForText2Image
import torch

pipeline = AutoPipelineForText2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda")
pipeline.load_lora_weights("TheLastBen/William_Eggleston_Style_SDXL", weight_name="wegg.safetensors")

# LoRA๋ฅผ ํŠธ๋ฆฌ๊ฑฐํ•˜๊ธฐ ์œ„ํ•ด william eggleston๋ฅผ ํ”„๋กฌํ”„ํŠธ์— ์‚ฌ์šฉ
prompt = "a house by william eggleston, sunrays, beautiful, sunlight, sunrays, beautiful"
image = pipeline(prompt=prompt).images[0]
image

IP-Adapter

IP-Adapter๋Š” ๋ชจ๋“  diffusion ๋ชจ๋ธ์— ์ด๋ฏธ์ง€ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒฝ๋Ÿ‰ ์–ด๋Œ‘ํ„ฐ์ž…๋‹ˆ๋‹ค. ์ด ์–ด๋Œ‘ํ„ฐ๋Š” ์ด๋ฏธ์ง€์™€ ํ…์ŠคํŠธ feature์˜ cross-attention ๋ ˆ์ด์–ด๋ฅผ ๋ถ„๋ฆฌํ•˜์—ฌ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค. ๋‹ค๋ฅธ ๋ชจ๋“  ๋ชจ๋ธ ์ปดํฌ๋„ŒํŠธํŠผ freeze๋˜๊ณ  UNet์˜ embedded ์ด๋ฏธ์ง€ features๋งŒ ํ•™์Šต๋ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ IP-Adapter ํŒŒ์ผ์€ ์ผ๋ฐ˜์ ์œผ๋กœ ์ตœ๋Œ€ 100MB์— ๋ถˆ๊ณผํ•ฉ๋‹ˆ๋‹ค.

๋‹ค์–‘ํ•œ ์ž‘์—…๊ณผ ๊ตฌ์ฒด์ ์ธ ์‚ฌ์šฉ ์‚ฌ๋ก€์— IP-Adapter๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ IP-Adapter ๊ฐ€์ด๋“œ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Diffusers๋Š” ํ˜„์žฌ ๊ฐ€์žฅ ๋งŽ์ด ์‚ฌ์šฉ๋˜๋Š” ์ผ๋ถ€ ํŒŒ์ดํ”„๋ผ์ธ์— ๋Œ€ํ•ด์„œ๋งŒ IP-Adapter๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. ๋ฉ‹์ง„ ์‚ฌ์šฉ ์‚ฌ๋ก€๊ฐ€ ์žˆ๋Š” ์ง€์›๋˜์ง€ ์•Š๋Š” ํŒŒ์ดํ”„๋ผ์ธ์— IP-Adapter๋ฅผ ํ†ตํ•ฉํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด ์–ธ์ œ๋“ ์ง€ ๊ธฐ๋Šฅ ์š”์ฒญ์„ ์—ฌ์„ธ์š”! ๊ณต์‹ IP-Adapter ์ฒดํฌํฌ์ธํŠธ๋Š” h94/IP-Adapter์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์‹œ์ž‘ํ•˜๋ ค๋ฉด Stable Diffusion ์ฒดํฌํฌ์ธํŠธ๋ฅผ ๋ถˆ๋Ÿฌ์˜ค์„ธ์š”.

from diffusers import AutoPipelineForText2Image
import torch
from diffusers.utils import load_image

pipeline = AutoPipelineForText2Image.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to("cuda")

๊ทธ๋Ÿฐ ๋‹ค์Œ IP-Adapter ๊ฐ€์ค‘์น˜๋ฅผ ๋ถˆ๋Ÿฌ์™€ [~loaders.IPAdapterMixin.load_ip_adapter] ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํŒŒ์ดํ”„๋ผ์ธ์— ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.

pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="models", weight_name="ip-adapter_sd15.bin")

๋ถˆ๋Ÿฌ์˜จ ๋’ค, ์ด๋ฏธ์ง€ ๋ฐ ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ๊ฐ€ ์žˆ๋Š” ํŒŒ์ดํ”„๋ผ์ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€ ์ƒ์„ฑ ํ”„๋กœ์„ธ์Šค๋ฅผ ๊ฐ€์ด๋“œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/load_neg_embed.png")
generator = torch.Generator(device="cpu").manual_seed(33)
images = pipeline(
    prompt='best quality, high quality, wearing sunglasses',
    ip_adapter_image=image,
    negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality",
    num_inference_steps=50,
    generator=generator,
).images[0]
images
   

IP-Adapter Plus

IP-Adapter๋Š” ์ด๋ฏธ์ง€ ์ธ์ฝ”๋”๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€ feature๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. IP-Adapter ๋ฆฌํฌ์ง€ํ† ๋ฆฌ์— image_encoder ํ•˜์œ„ ํด๋”๊ฐ€ ์žˆ๋Š” ๊ฒฝ์šฐ, ์ด๋ฏธ์ง€ ์ธ์ฝ”๋”๊ฐ€ ์ž๋™์œผ๋กœ ๋ถˆ๋Ÿฌ์™€ ํŒŒ์ดํ”„๋ผ์ธ์— ๋“ฑ๋ก๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋ ‡์ง€ ์•Š์€ ๊ฒฝ์šฐ, [~transformers.CLIPVisionModelWithProjection] ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€ ์ธ์ฝ”๋”๋ฅผ ๋ช…์‹œ์ ์œผ๋กœ ๋ถˆ๋Ÿฌ์™€ ํŒŒ์ดํ”„๋ผ์ธ์— ์ „๋‹ฌํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

์ด๋Š” ViT-H ์ด๋ฏธ์ง€ ์ธ์ฝ”๋”๋ฅผ ์‚ฌ์šฉํ•˜๋Š” IP-Adapter Plus ์ฒดํฌํฌ์ธํŠธ์— ํ•ด๋‹นํ•˜๋Š” ์ผ€์ด์Šค์ž…๋‹ˆ๋‹ค.

from transformers import CLIPVisionModelWithProjection

image_encoder = CLIPVisionModelWithProjection.from_pretrained(
    "h94/IP-Adapter",
    subfolder="models/image_encoder",
    torch_dtype=torch.float16
)

pipeline = AutoPipelineForText2Image.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    image_encoder=image_encoder,
    torch_dtype=torch.float16
).to("cuda")

pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name="ip-adapter-plus_sdxl_vit-h.safetensors")

IP-Adapter Face ID ๋ชจ๋ธ

IP-Adapter FaceID ๋ชจ๋ธ์€ CLIP ์ด๋ฏธ์ง€ ์ž„๋ฒ ๋”ฉ ๋Œ€์‹  insightface์—์„œ ์ƒ์„ฑํ•œ ์ด๋ฏธ์ง€ ์ž„๋ฒ ๋”ฉ์„ ์‚ฌ์šฉํ•˜๋Š” ์‹คํ—˜์ ์ธ IP Adapter์ž…๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ชจ๋ธ ์ค‘ ์ผ๋ถ€๋Š” LoRA๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ID ์ผ๊ด€์„ฑ์„ ๊ฐœ์„ ํ•˜๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋ ค๋ฉด insightface์™€ ํ•ด๋‹น ์š”๊ตฌ ์‚ฌํ•ญ์„ ๋ชจ๋‘ ์„ค์น˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

InsightFace ์‚ฌ์ „ํ•™์Šต๋œ ๋ชจ๋ธ์€ ๋น„์ƒ์—…์  ์—ฐ๊ตฌ ๋ชฉ์ ์œผ๋กœ๋งŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ, IP-Adapter-FaceID ๋ชจ๋ธ์€ ์—ฐ๊ตฌ ๋ชฉ์ ์œผ๋กœ๋งŒ ๋ฆด๋ฆฌ์ฆˆ๋˜์—ˆ์œผ๋ฉฐ ์ƒ์—…์  ์šฉ๋„๋กœ๋Š” ์‚ฌ์šฉํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.
pipeline = AutoPipelineForText2Image.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16
).to("cuda")

pipeline.load_ip_adapter("h94/IP-Adapter-FaceID", subfolder=None, weight_name="ip-adapter-faceid_sdxl.bin", image_encoder_folder=None)

๋‘ ๊ฐ€์ง€ IP ์–ด๋Œ‘ํ„ฐ FaceID Plus ๋ชจ๋ธ ์ค‘ ํ•˜๋‚˜๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๋Š” ๊ฒฝ์šฐ, ์ด ๋ชจ๋ธ๋“ค์€ ๋” ๋‚˜์€ ์‚ฌ์‹ค๊ฐ์„ ์–ป๊ธฐ ์œ„ํ•ด insightface์™€ CLIP ์ด๋ฏธ์ง€ ์ž„๋ฒ ๋”ฉ์„ ๋ชจ๋‘ ์‚ฌ์šฉํ•˜๋ฏ€๋กœ, CLIP ์ด๋ฏธ์ง€ ์ธ์ฝ”๋”๋„ ๋ถˆ๋Ÿฌ์™€์•ผ ํ•ฉ๋‹ˆ๋‹ค.

from transformers import CLIPVisionModelWithProjection

image_encoder = CLIPVisionModelWithProjection.from_pretrained(
    "laion/CLIP-ViT-H-14-laion2B-s32B-b79K",
    torch_dtype=torch.float16,
)

pipeline = AutoPipelineForText2Image.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    image_encoder=image_encoder,
    torch_dtype=torch.float16
).to("cuda")

pipeline.load_ip_adapter("h94/IP-Adapter-FaceID", subfolder=None, weight_name="ip-adapter-faceid-plus_sd15.bin")