Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders
Abstract
Sparse autoencoders (SAEs) have become a core ingredient in the reverse engineering of large-language models (LLMs). For LLMs, they have been shown to decompose intermediate representations that often are not interpretable directly into sparse sums of interpretable features, facilitating better control and subsequent analysis. However, similar analyses and approaches have been lacking for text-to-image models. We investigated the possibility of using SAEs to learn interpretable features for a few-step text-to-image diffusion models, such as SDXL Turbo. To this end, we train SAEs on the updates performed by transformer blocks within SDXL Turbo's denoising U-net. We find that their learned features are interpretable, causally influence the generation process, and reveal specialization among the blocks. In particular, we find one block that deals mainly with image composition, one that is mainly responsible for adding local details, and one for color, illumination, and style. Therefore, our work is an important first step towards better understanding the internals of generative text-to-image models like SDXL Turbo and showcases the potential of features learned by SAEs for the visual domain. Code is available at https://github.com/surkovv/sdxl-unbox
Community
Create similar mixes as ^ in our 🤗 Spaces Demo! https://huggingface.co/spaces/surokpro2/Unboxing_SDXL_with_SAEs
GitHub: https://github.com/surkovv/sdxl-unbox
TLDR
We trained Sparse Autoencoders (SAEs) on updates of transformer blocks of Stable Diffusion XL Turbo. The learned features are highly interpretable, causal, and can be used to manipulate generated images. Additionally these SAE features reveal specific roles that transformer blocks play in the generation process.
Our work is first to mechanically interpret the intermediate representations of a modern text-to-image diffusion model.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Residual Stream Analysis with Multi-Layer SAEs (2024)
- Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis (2024)
- Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders (2024)
- Efficient Training of Sparse Autoencoders for Large Language Models via Layer Groups (2024)
- TextBoost: Towards One-Shot Personalization of Text-to-Image Models via Fine-tuning Text Encoder (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper