--- license: openrail datasets: - ChristophSchuhmann/LAION-5B-EN-Aesthetics-Subset_above_6 --- ![banner-large.jpeg](https://s3.amazonaws.com/moonup/production/uploads/1674039767068-62bd5f951e22ec84279820e8.jpeg) Image Mixer is a model that lets you combine the concepts, styles, and compositions from multiple images (and text prompts too) and generate new images. It was trained by [Justin Pinkney](https://www.justinpinkney.com) at [Lambda Labs](https://lambdalabs.com/). ## Training details This model is a fine tuned version of [Stable Diffusion Image Variations](https://huggingface.co/lambdalabs/sd-image-variations-diffusers) it has been trained to accept multiple CLIP embedding concatenated along the sequence dimension (as opposed to 1 in the original model). During training up to 5 crops of the training images are taken and CLIP embeddings are extracted, these are concatenated and used as the conditioning for the model. At inference time, CLIP embeddings from multiple images can be used to generate images which are influence by multiple inputs. Training was done at 640x640 on a subset of LAION improved aesthetics, using 8xA100 from [Lambda GPU Cloud](https://cloud.lambdalabs.com). ## Usage The model is available on [huggingface spaces](https://huggingface.co/spaces/lambdalabs/image-mixer-demo) or to run locally do the following: ```bash git clone https://github.com/justinpinkney/stable-diffusion.git cd stable-diffusion git checkout 1c8a598f312e54f614d1b9675db0e66382f7e23c python -m venv .venv --prompt sd . .venv/bin/activate pip install -U pip pip install -r requirements.txt python scripts/gradio_image_mixer.py ``` Then navigate to the gradio demo link printed in the terminal. For details on how to use the model outside the app refer to the [`run` function](https://github.com/justinpinkney/stable-diffusion/blob/c1963a36a4f8ce23784c8247fa1af0e34e02b766/scripts/gradio_image_mixer.py#L79) in `gradio_image_mixer.py`