|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- lambdalabs/pokemon-blip-captions |
|
--- |
|
|
|
# Introduction |
|
|
|
This is the example model of [Distill SDXL](https://github.com/okotaku/diffengine/tree/main/configs/distill_sd). |
|
The training is based on [DiffEngine](https://github.com/okotaku/diffengine), the open-source toolbox for training state-of-the-art Diffusion Models with diffusers and mmengine. |
|
|
|
Paper: [On Architectural Compression of Text-to-Image Diffusion Models](https://arxiv.org/abs/2305.15798) |
|
Unofficial implementation: https://github.com/segmind/distill-sd |
|
|
|
# Training |
|
|
|
``` |
|
pip install openmim |
|
pip install git+https://github.com/okotaku/diffengine.git |
|
mim train diffengine tiny_sd_xl_pokemon_blip.py |
|
``` |
|
|
|
More details to my blog post: |
|
|
|
# Dataset |
|
|
|
I used [lambdalabs/pokemon-blip-captions](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions). |
|
|
|
# Inference |
|
|
|
``` |
|
import torch |
|
from diffusers import DiffusionPipeline, UNet2DConditionModel, AutoencoderKL |
|
|
|
checkpoint = 'takuoko/tiny_sd_xl_pokemon_blip' |
|
prompt = 'a picture of a pink and yellow pokemon with a sword' |
|
|
|
unet = UNet2DConditionModel.from_pretrained( |
|
checkpoint, torch_dtype=torch.bfloat16 |
|
) |
|
vae = AutoencoderKL.from_pretrained( |
|
'madebyollin/sdxl-vae-fp16-fix', |
|
torch_dtype=torch.bfloat16, |
|
) |
|
pipe = DiffusionPipeline.from_pretrained( |
|
'stabilityai/stable-diffusion-xl-base-1.0', unet=unet, vae=vae, torch_dtype=torch.bfloat16 |
|
) |
|
pipe.to('cuda') |
|
|
|
image = pipe( |
|
prompt, |
|
num_inference_steps=50, |
|
).images[0] |
|
image.save('demo.png') |
|
``` |
|
|
|
# Example result |
|
|
|
prompt = 'a picture of a pink and yellow pokemon with a sword' |
|
|
|
![image](demo.png) |