Spaces:
Runtime error
FilterPrompt: Guiding Image Transfer in Diffusion Models
We propose FilterPrompt, an approach to enhance the model control effect. It can be universally applied to any diffusion model, allowing users to adjust the representation of specific image features in accordance with task requirements, thereby facilitating more precise and controllable generation outcomes. In particular, our designed experiments demonstrate that the FilterPrompt optimizes feature correlation, mitigates content conflicts during the generation process, and enhances the model's control capability.
Getting Started
Prerequisites
- We recommend running this repository using Anaconda.
- NVIDIA GPU (Available memory is greater than 20GB)
- CUDA CuDNN (version β₯ 11.1, we actually use 11.7)
- Python 3.11.3 (Gradio requires Python 3.8 or higher)
- PyTorch: Find the torch version that is suitable for the current cuda
- γexampleγ:
pip install torch==2.0.0+cu117 torchvision==0.15.1+cu117 torchaudio==2.0.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117
- γexampleγ:
Installation
Specifically, inspired by the concept of decoupled cross-attention in IP-Adapter, we apply a similar methodology. Please follow the instructions below to complete the environment configuration required for the code:
- Cloning this repo
git clone --single-branch --branch main https://github.com/Meaoxixi/FilterPrompt.git
- Dependencies
All dependencies for defining the environment are provided in requirements.txt
.
cd FilterPrompt
conda create --name fp_env python=3.11.3
conda activate fp_env
pip install torch==2.0.0+cu117 torchvision==0.15.1+cu117 torchaudio==2.0.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117
pip install -r requirements.txt
- Download the necessary modules in the relative path
models/
from the following links
Path | Description |
---|---|
models/ |
root path |
βββ ControlNet/ |
Place the pre-trained model of ControlNet |
βββ control_v11f1p_sd15_depth |
ControlNet_depth |
βββ control_v11p_sd15_softedge |
ControlNet_softEdge |
βββ IP-Adapter/ |
IP-Adapter |
βββ image_encoder |
image_encoder of IP-Adapter |
βββ other needed configuration files |
|
βββ sd-vae-ft-mse/ |
Place the model of sd-vae-ft-mse |
βββ stable-diffusion-v1-5/ |
Place the model of stable-diffusion-v1-5 |
βββ Realistic_Vision_V4.0_noVAE/ |
Place the model of Realistic_Vision_V4.0_noVAE |
Demo on Gradio
After installation and downloading the models, you can use python app.py
to perform code in gradio. We have designed four task types to facilitate you to experience the application scenarios of FilterPrompt.
Citation
If you find FilterPrompt helpful in your research/applications, please cite using this BibTeX:
@misc{wang2024filterprompt,
title={FilterPrompt: Guiding Image Transfer in Diffusion Models},
author={Xi Wang and Yichen Peng and Heng Fang and Haoran Xie and Xi Yang and Chuntao Li},
year={2024},
eprint={2404.13263},
archivePrefix={arXiv},
primaryClass={cs.CV}
}