docs/description.md · nota-ai/compressed-stable-diffusion at 804fd2148d278777318559b9936b2baedef1026b

This demo showcases a lightweight Stable Diffusion model (SDM) for general-purpose text-to-image synthesis. Our model BK-SDM-Small achieves 36% reduced parameters and latency. This model is bulit with (i) removing several residual and attention blocks from the U-Net of SDM-v1.4 and (ii) distillation pretraining on only 0.22M LAION pairs (fewer than 0.1% of the full training set). Despite very limited training resources, our model can imitate the original SDM by benefiting from transferred knowledge.

U-Net architectures and KD-based pretraining

Notice

This research is accepted to ICCV 2023 Demo Track — title: Architecturally Compressed Stable Diffusion for Efficient Text-to-Image Generation.
Please be aware that your prompts are logged (without any personally identifiable information).
To generate different images with the same prompt, please change Random Seed in Advanced Settings (because this demo only uses the firstly sampled latent code for each seed).
Many parts of the demo codes were borrowed from stabilityai/stable-diffusion and akhaliq/small-stable-diffusion-v0. Thanks, Stability AI (@stabilityai) and AK (@akhaliq)!

Updates

[May/31/2023] The demo is running on T4-small (4 vCPU · 15 GB RAM · 16GB VRAM). It takes 5~10 seconds for the original model to generate a 512×512 image with 25 denoising steps. Our compressed model accelerates inference speed while preserving visually compelling results.