Spaces:

nota-ai
/

compressed-stable-diffusion

Runtime error

App Files Files Community

compressed-stable-diffusion / docs /description.md

Bo-Kyeong Kim

Update docs/description.md

804fd21 almost 2 years ago

|

1.91 kB

	This demo showcases a lightweight Stable Diffusion model (SDM) for general-purpose text-to-image synthesis. Our model BK-SDM-Small achieves 36% reduced parameters and latency. This model is bulit with (i) removing several residual and attention blocks from the U-Net of SDM-v1.4 and (ii) distillation pretraining on only 0.22M LAION pairs (fewer than 0.1% of the full training set). Despite very limited training resources, our model can imitate the original SDM by benefiting from transferred knowledge.

	<center>
	<img alt="U-Net architectures and KD-based pretraining" img src="https://huggingface.co/spaces/nota-ai/theme/resolve/3bb3eed8b911d0baf306767bb9548bf732052c53/docs/compressed_stable_diffusion/fig_model.png" width="65%">
	</center>

	<br/>


	### Notice
	- This research is accepted to [ICCV 2023 Demo Track](https://iccv2023.thecvf.com/) — title: Architecturally Compressed Stable Diffusion for Efficient Text-to-Image Generation.
	- Please be aware that your prompts are logged (_without_ any personally identifiable information).
	- To generate different images with the same prompt, please change _Random Seed_ in Advanced Settings (because this demo only uses the firstly sampled latent code for each seed).
	- Many parts of the demo codes were borrowed from [stabilityai/stable-diffusion](https://huggingface.co/spaces/stabilityai/stable-diffusion) and [akhaliq/small-stable-diffusion-v0](https://huggingface.co/spaces/akhaliq/small-stable-diffusion-v0). Thanks, Stability AI ([@stabilityai](https://huggingface.co/stabilityai)) and AK ([@akhaliq](https://huggingface.co/akhaliq))!

	### Updates
	- [May/31/2023] The demo is running on T4-small (4 vCPU · 15 GB RAM · 16GB VRAM). It takes 5~10 seconds for the original model to generate a 512×512 image with 25 denoising steps. Our compressed model accelerates inference speed while preserving visually compelling results.