alexnasa commited on
Commit
87a38c7
·
verified ·
1 Parent(s): 4479f79

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -122
README.md CHANGED
@@ -1,122 +1,12 @@
1
- # XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation
2
-
3
- <p align="center">
4
- <a href="https://arxiv.org/abs/2506.21416">
5
- <img alt="Build" src="https://img.shields.io/badge/arXiv%20paper-2506.21416-b31b1b.svg">
6
- </a>
7
- <a href="https://bytedance.github.io/XVerse/">
8
- <img alt="Project Page" src="https://img.shields.io/badge/Project-Page-blue">
9
- </a>
10
- <a href="https://github.com/bytedance/XVerse/tree/main/assets">
11
- <img alt="Build" src="https://img.shields.io/badge/XVerseBench-Dataset-green">
12
- </a>
13
- <a href="https://huggingface.co/ByteDance/XVerse">
14
- <img alt="Build" src="https://img.shields.io/badge/🤗-HF%20Model-yellow">
15
- </a>
16
- </p>
17
-
18
- ## 🔥 News
19
- - **2025.6.26**: The code has been released!
20
-
21
- ![XVerse's capability in single/multi-subject personalization and semantic attribute control (pose, style, lighting)](sample/first_page.png)
22
-
23
- ## 📖 Introduction
24
-
25
- **XVerse** introduces a novel approach to multi-subject image synthesis, offering **precise and independent control over individual subjects** without disrupting the overall image latents or features. We achieve this by transforming reference images into offsets for token-specific text-stream modulation.
26
-
27
- This innovation enables high-fidelity, editable image generation where you can robustly control both **individual subject characteristics** (identity) and their **semantic attributes**. XVerse significantly enhances capabilities for personalized and complex scene generation.
28
-
29
- ## ⚡️ Quick Start
30
-
31
- ### Requirements and Installation
32
-
33
- First, install the necessary dependencies:
34
-
35
- ```bash
36
- # Create a conda environment named XVerse with Python version 3.10.16
37
- conda create -n XVerse python=3.10.16 -y
38
- # Activate the XVerse environment
39
- conda activate XVerse
40
- # Use pip to install the dependencies specified in requirements.txt
41
- pip install -r requirements.txt
42
- ```
43
-
44
- Next, download the required checkpoints:
45
- ```bash
46
- cd checkpoints
47
- bash ./download_ckpts.sh
48
- cd ..
49
- ```
50
- **Important**: You'll also need to download the face recognition model `model_ir_se50.pth` from [InsightFace_Pytorch](https://github.com/TreB1eN/InsightFace_Pytorch) and place it directly into the `./checkpoints/` folder.
51
-
52
- ### Local Gradio Demo
53
-
54
- To run the interactive Gradio demo locally, execute the following command:
55
- ```bash
56
- bash run_demo.sh
57
- ```
58
-
59
- #### Input Settings Explained
60
- The Gradio demo provides several parameters to control your image generation process:
61
- * **Prompt**: The textual description guiding the image generation.
62
- * **Generated Height/Width**: Use the sliders to set the shape of the output image.
63
- * **Weight_id/ip**: Adjust these weight parameters. Higher values generally lead to better subject consistency but might slightly impact the naturalness of the generated image.
64
- * **latent_lora_scale and vae_lora_scale**: Control the LoRA scale. Similar to Weight_id/ip, larger LoRA values can improve subject consistency but may reduce image naturalness.
65
- * **vae_skip_iter_before and vae_skip_iter_after**: Configure VAE skip iterations. Skipping more steps can result in better naturalness but might compromise subject consistency.
66
-
67
- #### Input Images
68
-
69
- The demo provides detailed control over your input images:
70
-
71
- * **Expand Panel**: Click "Input Image X" to reveal the options for each image.
72
- * **Upload Image**: Click "Image X" to upload your desired reference image.
73
- * **Image Description**: Enter a description in the "Caption X" input box. You can also click "Auto Caption" to generate a description automatically.
74
- * **Detection & Segmentation**: Click "Det & Seg" to perform detection and segmentation on the uploaded image.
75
- * **Crop Face**: Use "Crop Face" to automatically crop the face from the image.
76
- * **ID Checkbox**: Check or uncheck "ID or not" to determine whether to use ID-related weights for that specific input image.
77
-
78
- > **⚠️ Important Usage Notes:**
79
- >
80
- > * **Prompt Construction**: The main text prompt **MUST** include the exact text you entered in the `Image Description` field for each active image. **Generation will fail if this description is missing from the prompt.**
81
- > * *Example*: If you upload two images and set their descriptions as "a man with red hair" (for Image 1) and "a woman with blue eyes" (for Image 2), your main prompt might be: "A `a man with red hair` walking beside `a woman with blue eyes` in a park."
82
- > * You can then write your main prompt simply as: "`ENT1` walking beside `ENT2` in a park." The code will **automatically replace** these placeholders with the full description text before generation.
83
- > * **Active Images**: Only images in **expanded** (un-collapsed) panels will be fed into the model. Collapsed image panels are ignored.
84
-
85
- ## Inference with XVerseBench
86
-
87
- ![XVerseBench](sample/XVerseBench.png)
88
-
89
- First, please download XVerseBench according to the contents in the `assets` folder. Then, when running inference, please execute the following command:
90
- ```bash
91
- bash ./eval/eval_scripts/run_eval.sh
92
- ```
93
- The script will automatically evaluate the model on the XVerseBench dataset and save the results in the `./results` folder.
94
-
95
- ## 📌 ToDo
96
-
97
- - [x] Release github repo.
98
- - [x] Release arXiv paper.
99
- - [x] Release model checkpoints.
100
- - [x] Release inference data: XVerseBench.
101
- - [x] Release inference code for XVerseBench.
102
- - [x] Release inference code for gradio demo.
103
- - [ ] Release inference code for single sample.
104
- - [ ] Release huggingface space demo.
105
- - [ ] Release Benchmark Leaderboard.
106
-
107
- ## License
108
-
109
- The code in this project is licensed under Apache 2.0; the dataset is licensed under CC0, subject to the intellctual property owned by Bytedance. Meanwhile, the dataset is adapted from [dreambench++](https://dreambenchplus.github.io/), you should also comply with the license of dreambench++.
110
-
111
- ## Citation
112
- If XVerse is helpful, please help to ⭐ the repo.
113
-
114
- If you find this project useful for your research, please consider citing our paper:
115
- ```bibtex
116
- @article{chen2025xverse,
117
- title={XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation},
118
- author={Chen, Bowen and Zhao, Mengyi and Sun, Haomiao and Chen, Li and Wang, Xu and Du, Kang and Wu, Xinglong},
119
- journal={arXiv preprint arXiv:2506.21416},
120
- year={2025}
121
- }
122
- ```
 
1
+ ---
2
+ title: XVerse
3
+ emoji: 📉
4
+ colorFrom: gray
5
+ colorTo: indigo
6
+ sdk: gradio
7
+ sdk_version: 5.34.1
8
+ app_file: app.py
9
+ pinned: false
10
+ ---
11
+
12
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference