# Stable Diffusion Text Inversion with Loss Functions This repository contains a Gradio web application that provides an intuitive interface for generating images using Stable Diffusion with textual inversion and guided loss functions. ## Overview The application allows users to explore the capabilities of Stable Diffusion by: - Generating images from text prompts - Using textual inversion concepts - Applying various loss functions to guide the diffusion process - Generating multiple images with different seeds !![alt text](image.png) ## Features ### Core Functionality - **Text-to-Image Generation**: Create detailed images from descriptive text prompts - **Textual Inversion**: Apply learned concepts to your generations - **Loss Function Guidance**: Shape image generation with specialized loss functions: - **Blue Loss**: Emphasizes blue tones in the generated images - **Elastic Loss**: Creates distortion effects by applying elastic transformations - **Symmetry Loss**: Encourages symmetrical image generation - **Saturation Loss**: Enhances color saturation in the output - **Multi-Seed Generation**: Create multiple variations of an image with different seeds ## Installation ### Prerequisites - Python 3.8+ - CUDA-capable GPU (recommended) ### Setup 1. Clone this repository: ```bash git clone https://github.com/yourusername/stable-diffusion-text-inversion.git cd stable-diffusion-text-inversion ``` 2. Install dependencies: ```bash pip install torch diffusers transformers tqdm torchvision matplotlib gradio ``` 3. Run the application: ```bash python gradio_app.py ``` 4. Open the provided URL (typically http://localhost:7860) in your browser. ## Understanding the Technology ### Stable Diffusion Stable Diffusion is a latent text-to-image diffusion model developed by Stability AI. It works by: 1. **Encoding text**: Converting text prompts into embeddings that the model can understand 2. **Starting with noise**: Beginning with random noise in a latent space 3. **Iterative denoising**: Gradually removing noise while being guided by the text embeddings 4. **Decoding to image**: Converting the final latent representation to a pixel-based image The model operates in a compressed latent space (64x64x4) rather than pixel space (512x512x3), allowing for efficient generation of high-resolution images with limited computational resources. ### Textual Inversion Textual Inversion is a technique that allows Stable Diffusion to learn new concepts from just a few example images. Key aspects include: - **Custom Concepts**: Learn new visual concepts not present in the model's training data - **Few-Shot Learning**: Typically requires only 3-5 examples of a concept - **Token Optimization**: Creates a new "pseudo-word" embedding that represents the concept - **Seamless Integration**: Once learned, concepts can be used in prompts just like regular words In this application, we load several pre-trained textual inversion concepts from the SD concepts library: - Rimworld art style - HK Golden Lantern - Phoenix-01 - Fractal Flame - Scarlet Witch ### Guided Loss Functions This application introduces an innovative approach by applying custom loss functions during the diffusion process: 1. **How it works**: During generation, we periodically decode the current latent representation, apply a loss function to the decoded image, and backpropagate that loss to adjust the latents. 2. **Types of Loss Functions**: - **Blue Loss**: Encourages pixels to have higher values in the blue channel - **Elastic Loss**: Minimizes difference between the image and an elastically transformed version - **Symmetry Loss**: Minimizes difference between the image and its horizontal mirror - **Saturation Loss**: Pushes the image toward higher color saturation 3. **Impact**: These loss functions can dramatically alter the aesthetic qualities of the generated images, creating effects that would be difficult to achieve through prompt engineering alone. ## Usage Examples ### Basic Image Generation 1. Enter a prompt in the text box (e.g., "A majestic castle on a floating island with waterfalls") 2. Set Loss Type to "N/A" and uncheck "Apply Loss Function" 3. Enter a seed value (e.g., "42") 4. Click "Generate Images" ### Applying Loss Functions 1. Enter your prompt 2. Select a Loss Type (e.g., "symmetry") 3. Check "Apply Loss Function" 4. Enter a seed value 5. Click "Generate Images" ### Batch Generation 1. Enter your prompt 2. Select desired loss settings 3. Enter multiple comma-separated seeds (e.g., "42, 100, 500") 4. Click "Generate Images" to generate a grid of variations ## Contributing Contributions are welcome! Please feel free to submit a Pull Request. ## License This project is licensed under the MIT License - see the LICENSE file for details. ## Acknowledgments - [Stability AI](https://stability.ai/) for developing Stable Diffusion - [Hugging Face](https://huggingface.co/) for the Diffusers library - [Gradio](https://gradio.app/) for the web interface framework - The creators of the textual inversion concepts used in this project