--- title: Unboxing SDXL with SAEs app_file: app.py sdk: gradio sdk_version: 4.44.1 --- # Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders ![modification demostration](resourses/image.png) This repository contains code to reproduce results from our paper (https://arxiv.org/abs/2410.22366) on using sparse autoencoders (SAEs) to analyze and interpret the internal representations of text-to-image diffusion models, specifically SDXL Turbo. ## Repository Structure ``` |-- SAE/ # Core sparse autoencoder implementation |-- SDLens/ # Tools for analyzing diffusion models | `-- hooked_sd_pipeline.py # Modified stable diffusion pipeline |-- scripts/ | |-- collect_latents_dataset.py # Generate training data | `-- train_sae.py # Train SAE models |-- utils/ | `-- hooks.py # Hook utility functions |-- checkpoints/ # Pretrained SAE model checkpoints |-- app.py # Demo application |-- app.ipynb # Interactive notebook demo |-- example.ipynb # Usage examples `-- requirements.txt # Python dependencies ``` ## Installation ```bash pip install -r requirements.txt ``` ## Demo Application You can try our gradio demo application (`app.ipynb`) to browse and experiment with 20K+ features of our trained SAEs out-of-the-box. You can find the same notebook on [Google Colab](https://colab.research.google.com/drive/1Sd-g3w2Fwv7pc_fxgeQOR3S_RKr18qMP?usp=sharing). ## Usage 1. Collect latent data from SDXL Turbo: ```bash python scripts/collect_latents_dataset.py --save_path={your_save_path} ``` 2. Train sparse autoencoders: 2.1. Insert the path of stored latents and directory to store checkpoints in `SAE/config.json` 2.2. Run the training script: ```bash python scripts/train_sae.py ``` ## Pretrained Models We provide pretrained SAE checkpoints for 4 key transformer blocks in SDXL Turbo's U-Net. See `example.ipynb` for analysis examples and visualization of learned features. ## Citation If you find this code useful in your research, please cite our paper: ```bibtex @misc{surkov2024unpackingsdxlturbointerpreting, title={Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders}, author={Viacheslav Surkov and Chris Wendler and Mikhail Terekhov and Justin Deschenaux and Robert West and Caglar Gulcehre}, year={2024}, eprint={2410.22366}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2410.22366}, } ``` ## Acknowledgements The SAE component was implemented based on [`openai/sparse_autoencoder`](https://github.com/openai/sparse_autoencoder) repository.