Bo-Kyeong Kim
Update docs/description.md
804fd21
|
raw
history blame
1.91 kB

This demo showcases a lightweight Stable Diffusion model (SDM) for general-purpose text-to-image synthesis. Our model BK-SDM-Small achieves 36% reduced parameters and latency. This model is bulit with (i) removing several residual and attention blocks from the U-Net of SDM-v1.4 and (ii) distillation pretraining on only 0.22M LAION pairs (fewer than 0.1% of the full training set). Despite very limited training resources, our model can imitate the original SDM by benefiting from transferred knowledge.

U-Net architectures and KD-based pretraining

Notice

  • This research is accepted to ICCV 2023 Demo Track — title: Architecturally Compressed Stable Diffusion for Efficient Text-to-Image Generation.
  • Please be aware that your prompts are logged (without any personally identifiable information).
  • To generate different images with the same prompt, please change Random Seed in Advanced Settings (because this demo only uses the firstly sampled latent code for each seed).
  • Many parts of the demo codes were borrowed from stabilityai/stable-diffusion and akhaliq/small-stable-diffusion-v0. Thanks, Stability AI (@stabilityai) and AK (@akhaliq)!

Updates

  • [May/31/2023] The demo is running on T4-small (4 vCPU · 15 GB RAM · 16GB VRAM). It takes 5~10 seconds for the original model to generate a 512×512 image with 25 denoising steps. Our compressed model accelerates inference speed while preserving visually compelling results.