Text-to-Image
Diffusers
English
Files changed (1) hide show
  1. README.md +276 -276
README.md CHANGED
@@ -1,277 +1,277 @@
1
- ---
2
- license: apache-2.0
3
- ---
4
- <div align="center">
5
-
6
- [//]: # (<h1>CSGO: Content-Style Composition in Text-to-Image Generation</h1>)
7
-
8
- [//]: # ()
9
- [//]: # ([**Peng Xing**]&#40;https://github.com/xingp-ng&#41;<sup>12*</sup> · [**Haofan Wang**]&#40;https://haofanwang.github.io/&#41;<sup>1*</sup> · [**Yanpeng Sun**]&#40;https://scholar.google.com.hk/citations?user=a3FI8c4AAAAJ&hl=zh-CN&oi=ao/&#41;<sup>2</sup> · [**Qixun Wang**]&#40;https://github.com/wangqixun&#41;<sup>1</sup> · [**Xu Bai**]&#40;https://huggingface.co/baymin0220&#41;<sup>1</sup> · [**Hao Ai**]&#40;https://github.com/aihao2000&#41;<sup>13</sup> · [**Renyuan Huang**]&#40;https://github.com/DannHuang&#41;<sup>14</sup> · [**Zechao Li**]&#40;https://zechao-li.github.io/&#41;<sup>2✉</sup>)
10
-
11
- [//]: # ()
12
- [//]: # (<sup>1</sup>InstantX Team · <sup>2</sup>Nanjing University of Science and Technology · <sup>3</sup>Beihang University · <sup>4</sup>Peking University)
13
-
14
- [//]: # (<sup>*</sup>equal contributions, <sup>✉</sup>corresponding authors)
15
-
16
- <a href='https://csgo-gen.github.io/'><img src='https://img.shields.io/badge/Project-Page-green'></a>
17
- <a href='https://arxiv.org/abs/2408.16766'><img src='https://img.shields.io/badge/Technique-Report-red'></a>
18
- [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-App-red)](https://huggingface.co/spaces/xingpng/CSGO/)
19
- [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-blue)](https://huggingface.co/spaces/InstantX/CSGO)
20
-
21
-
22
- </div>
23
-
24
-
25
- [//]: # (## Updates 🔥)
26
-
27
- [//]: # ()
28
- [//]: # ([//]: # &#40;- **`2024/07/19`**: ✨ We support 🎞️ portrait video editing &#40;aka v2v&#41;! More to see [here]&#40;assets/docs/changelog/2024-07-19.md&#41;.&#41;)
29
- [//]: # ()
30
- [//]: # ([//]: # &#40;- **`2024/07/17`**: 🍎 We support macOS with Apple Silicon, modified from [jeethu]&#40;https://github.com/jeethu&#41;'s PR [#143]&#40;https://github.com/KwaiVGI/LivePortrait/pull/143&#41;.&#41;)
31
- [//]: # ()
32
- [//]: # ([//]: # &#40;- **`2024/07/10`**: 💪 We support audio and video concatenating, driving video auto-cropping, and template making to protect privacy. More to see [here]&#40;assets/docs/changelog/2024-07-10.md&#41;.&#41;)
33
- [//]: # ()
34
- [//]: # ([//]: # &#40;- **`2024/07/09`**: 🤗 We released the [HuggingFace Space]&#40;https://huggingface.co/spaces/KwaiVGI/liveportrait&#41;, thanks to the HF team and [Gradio]&#40;https://github.com/gradio-app/gradio&#41;!&#41;)
35
- [//]: # ([//]: # &#40;Continuous updates, stay tuned!&#41;)
36
- [//]: # (- **`2024/08/30`**: 😊 We released the initial version of the inference code.)
37
-
38
- [//]: # (- **`2024/08/30`**: 😊 We released the technical report on [arXiv]&#40;https://arxiv.org/pdf/2408.16766&#41;)
39
-
40
- [//]: # (- **`2024/07/15`**: 🔥 We released the [homepage]&#40;https://csgo-gen.github.io&#41;.)
41
-
42
- [//]: # ()
43
- [//]: # (## Plan 💪)
44
-
45
- [//]: # (- [x] technical report)
46
-
47
- [//]: # (- [x] inference code)
48
-
49
- [//]: # (- [ ] pre-trained weight)
50
-
51
- [//]: # (- [ ] IMAGStyle dataset)
52
-
53
- [//]: # (- [ ] training code)
54
-
55
- ## Introduction 📖
56
- This repo, named **CSGO**, contains the official PyTorch implementation of our paper [CSGO: Content-Style Composition in Text-to-Image Generation](https://arxiv.org/pdf/).
57
- We are actively updating and improving this repository. If you find any bugs or have suggestions, welcome to raise issues or submit pull requests (PR) 💖.
58
-
59
- ## Detail ✨
60
- We currently release two model weights.
61
-
62
- | Mode | content token | style token | Other |
63
- |:------------:|:-----------:|:-----------:|:---------------:|
64
- |csgo.bin|4|16| - |
65
- |csgo_4_32.bin|4|32| Deepspeed zero2 |
66
-
67
-
68
- ## Pipeline 💻
69
- <p align="center">
70
- <img src="assets/image3_1.jpg">
71
- </p>
72
-
73
- ## Capabilities 🚅
74
-
75
- 🔥 Our CSGO achieves **image-driven style transfer, text-driven stylized synthesis, and text editing-driven stylized synthesis**.
76
-
77
- 🔥 For more results, visit our <a href="https://csgo-gen.github.io"><strong>homepage</strong></a> 🔥
78
-
79
- <p align="center">
80
- <img src="assets/vis.jpg">
81
- </p>
82
-
83
-
84
- ## Getting Started 🏁
85
- ### 1. Clone the code and prepare the environment
86
- ```bash
87
- git clone https://github.com/instantX-research/CSGO
88
- cd CSGO
89
-
90
- # create env using conda
91
- conda create -n CSGO python=3.9
92
- conda activate CSGO
93
-
94
- # install dependencies with pip
95
- # for Linux and Windows users
96
- pip install -r requirements.txt
97
- ```
98
-
99
- ### 2. Download pretrained weights(coming soon)
100
-
101
- The easiest way to download the pretrained weights is from HuggingFace:
102
- ```bash
103
- # first, ensure git-lfs is installed, see: https://docs.github.com/en/repositories/working-with-files/managing-large-files/installing-git-large-file-storage
104
- git lfs install
105
- # clone and move the weights
106
- git clone https://huggingface.co/InstanX/CSGO CSGO
107
- ```
108
- Our method is fully compatible with [SDXL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0), [VAE](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix), [ControlNet](https://huggingface.co/TTPlanet/TTPLanet_SDXL_Controlnet_Tile_Realistic), and [Image Encoder](https://huggingface.co/h94/IP-Adapter/tree/main/sdxl_models/image_encoder).
109
- Please download them and place them in the ./base_models folder.
110
-
111
- tips:If you expect to load Controlnet directly using ControlNetPipeline as in CSGO, do the following:
112
- ```bash
113
- git clone https://huggingface.co/TTPlanet/TTPLanet_SDXL_Controlnet_Tile_Realistic
114
- mv TTPLanet_SDXL_Controlnet_Tile_Realistic/TTPLANET_Controlnet_Tile_realistic_v2_fp16.safetensors TTPLanet_SDXL_Controlnet_Tile_Realistic/diffusion_pytorch_model.safetensors
115
- ```
116
- ### 3. Inference 🚀
117
-
118
- ```python
119
- import torch
120
- from ip_adapter.utils import resize_content
121
- import numpy as np
122
- from ip_adapter.utils import BLOCKS as BLOCKS
123
- from ip_adapter.utils import controlnet_BLOCKS as controlnet_BLOCKS
124
- from PIL import Image
125
- from diffusers import (
126
- AutoencoderKL,
127
- ControlNetModel,
128
- StableDiffusionXLControlNetPipeline,
129
-
130
- )
131
- from ip_adapter import CSGO
132
-
133
-
134
- device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
135
-
136
- base_model_path = "./base_models/stable-diffusion-xl-base-1.0"
137
- image_encoder_path = "./base_models/IP-Adapter/sdxl_models/image_encoder"
138
- csgo_ckpt = "./CSGO/csgo.bin"
139
- pretrained_vae_name_or_path ='./base_models/sdxl-vae-fp16-fix'
140
- controlnet_path = "./base_models/TTPLanet_SDXL_Controlnet_Tile_Realistic"
141
- weight_dtype = torch.float16
142
-
143
-
144
- vae = AutoencoderKL.from_pretrained(pretrained_vae_name_or_path,torch_dtype=torch.float16)
145
- controlnet = ControlNetModel.from_pretrained(controlnet_path, torch_dtype=torch.float16,use_safetensors=True)
146
- pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
147
- base_model_path,
148
- controlnet=controlnet,
149
- torch_dtype=torch.float16,
150
- add_watermarker=False,
151
- vae=vae
152
- )
153
- pipe.enable_vae_tiling()
154
-
155
-
156
- target_content_blocks = BLOCKS['content']
157
- target_style_blocks = BLOCKS['style']
158
- controlnet_target_content_blocks = controlnet_BLOCKS['content']
159
- controlnet_target_style_blocks = controlnet_BLOCKS['style']
160
-
161
- csgo = CSGO(pipe, image_encoder_path, csgo_ckpt, device, num_content_tokens=4,num_style_tokens=32,
162
- target_content_blocks=target_content_blocks, target_style_blocks=target_style_blocks,controlnet_adapter=True,
163
- controlnet_target_content_blocks=controlnet_target_content_blocks,
164
- controlnet_target_style_blocks=controlnet_target_style_blocks,
165
- content_model_resampler=True,
166
- style_model_resampler=True,
167
-
168
- )
169
-
170
- style_name = 'img_1.png'
171
- content_name = 'img_0.png'
172
- style_image = Image.open("../assets/{}".format(style_name)).convert('RGB')
173
- content_image = Image.open('../assets/{}'.format(content_name)).convert('RGB')
174
-
175
- caption ='a small house with a sheep statue on top of it'
176
-
177
- num_sample=4
178
-
179
- #image-driven style transfer
180
- images = csgo.generate(pil_content_image= content_image, pil_style_image=style_image,
181
- prompt=caption,
182
- negative_prompt= "text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
183
- content_scale=1.0,
184
- style_scale=1.0,
185
- guidance_scale=10,
186
- num_images_per_prompt=num_sample,
187
- num_samples=1,
188
- num_inference_steps=50,
189
- seed=42,
190
- image=content_image.convert('RGB'),
191
- controlnet_conditioning_scale=0.6,
192
- )
193
-
194
- #text editing-driven stylized synthesis
195
- caption='a small house'
196
- images = csgo.generate(pil_content_image= content_image, pil_style_image=style_image,
197
- prompt=caption,
198
- negative_prompt= "text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
199
- content_scale=1.0,
200
- style_scale=1.0,
201
- guidance_scale=10,
202
- num_images_per_prompt=num_sample,
203
- num_samples=1,
204
- num_inference_steps=50,
205
- seed=42,
206
- image=content_image.convert('RGB'),
207
- controlnet_conditioning_scale=0.4,
208
- )
209
-
210
- #text-driven stylized synthesis
211
- caption='a cat'
212
- #If the content image still interferes with the generated results, set the content image to an empty image.
213
- # content_image =Image.fromarray(np.zeros((content_image.size[0],content_image.size[1], 3), dtype=np.uint8)).convert('RGB')
214
-
215
- images = csgo.generate(pil_content_image= content_image, pil_style_image=style_image,
216
- prompt=caption,
217
- negative_prompt= "text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
218
- content_scale=1.0,
219
- style_scale=1.0,
220
- guidance_scale=10,
221
- num_images_per_prompt=num_sample,
222
- num_samples=1,
223
- num_inference_steps=50,
224
- seed=42,
225
- image=content_image.convert('RGB'),
226
- controlnet_conditioning_scale=0.01,
227
- )
228
- ```
229
-
230
- ## Demos
231
- <p align="center">
232
- <br>
233
- 🔥 For more results, visit our <a href="https://csgo-gen.github.io"><strong>homepage</strong></a> 🔥
234
- </p>
235
-
236
- ### Content-Style Composition
237
- <p align="center">
238
- <img src="assets/page1.png">
239
- </p>
240
-
241
- <p align="center">
242
- <img src="assets/page4.png">
243
- </p>
244
-
245
- ### Cycle Translation
246
- <p align="center">
247
- <img src="assets/page8.png">
248
- </p>
249
-
250
- ### Text-Driven Style Synthesis
251
- <p align="center">
252
- <img src="assets/page10.png">
253
- </p>
254
-
255
- ### Text Editing-Driven Style Synthesis
256
- <p align="center">
257
- <img src="assets/page11.jpg">
258
- </p>
259
-
260
- ## Star History
261
- [![Star History Chart](https://api.star-history.com/svg?repos=instantX-research/CSGO&type=Date)](https://star-history.com/#instantX-research/CSGO&Date)
262
-
263
-
264
-
265
- ## Acknowledgements
266
- This project is developed by InstantX Team, all copyright reserved.
267
-
268
- ## Citation 💖
269
- If you find CSGO useful for your research, welcome to 🌟 this repo and cite our work using the following BibTeX:
270
- ```bibtex
271
- @article{xing2024csgo,
272
- title={CSGO: Content-Style Composition in Text-to-Image Generation},
273
- author={Peng Xing and Haofan Wang and Yanpeng Sun and Qixun Wang and Xu Bai and Hao Ai and Renyuan Huang and Zechao Li},
274
- year={2024},
275
- journal = {arXiv 2408.16766},
276
- }
277
  ```
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ <div align="center">
5
+
6
+ [//]: # (<h1>CSGO: Content-Style Composition in Text-to-Image Generation</h1>)
7
+
8
+ [//]: # ()
9
+ [//]: # ([**Peng Xing**]&#40;https://github.com/xingp-ng&#41;<sup>12*</sup> · [**Haofan Wang**]&#40;https://haofanwang.github.io/&#41;<sup>1*</sup> · [**Yanpeng Sun**]&#40;https://scholar.google.com.hk/citations?user=a3FI8c4AAAAJ&hl=zh-CN&oi=ao/&#41;<sup>2</sup> · [**Qixun Wang**]&#40;https://github.com/wangqixun&#41;<sup>1</sup> · [**Xu Bai**]&#40;https://huggingface.co/baymin0220&#41;<sup>1</sup> · [**Hao Ai**]&#40;https://github.com/aihao2000&#41;<sup>13</sup> · [**Renyuan Huang**]&#40;https://github.com/DannHuang&#41;<sup>14</sup> · [**Zechao Li**]&#40;https://zechao-li.github.io/&#41;<sup>2✉</sup>)
10
+
11
+ [//]: # ()
12
+ [//]: # (<sup>1</sup>InstantX Team · <sup>2</sup>Nanjing University of Science and Technology · <sup>3</sup>Beihang University · <sup>4</sup>Peking University)
13
+
14
+ [//]: # (<sup>*</sup>equal contributions, <sup>✉</sup>corresponding authors)
15
+
16
+ <a href='https://csgo-gen.github.io/'><img src='https://img.shields.io/badge/Project-Page-green'></a>
17
+ <a href='https://arxiv.org/abs/2408.16766'><img src='https://img.shields.io/badge/Technique-Report-red'></a>
18
+ [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-App-red)](https://huggingface.co/spaces/xingpng/CSGO/)
19
+ [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-blue)](https://huggingface.co/spaces/InstantX/CSGO)
20
+
21
+
22
+ </div>
23
+
24
+
25
+ [//]: # (## Updates 🔥)
26
+
27
+ [//]: # ()
28
+ [//]: # ([//]: # &#40;- **`2024/07/19`**: ✨ We support 🎞️ portrait video editing &#40;aka v2v&#41;! More to see [here]&#40;assets/docs/changelog/2024-07-19.md&#41;.&#41;)
29
+ [//]: # ()
30
+ [//]: # ([//]: # &#40;- **`2024/07/17`**: 🍎 We support macOS with Apple Silicon, modified from [jeethu]&#40;https://github.com/jeethu&#41;'s PR [#143]&#40;https://github.com/KwaiVGI/LivePortrait/pull/143&#41;.&#41;)
31
+ [//]: # ()
32
+ [//]: # ([//]: # &#40;- **`2024/07/10`**: 💪 We support audio and video concatenating, driving video auto-cropping, and template making to protect privacy. More to see [here]&#40;assets/docs/changelog/2024-07-10.md&#41;.&#41;)
33
+ [//]: # ()
34
+ [//]: # ([//]: # &#40;- **`2024/07/09`**: 🤗 We released the [HuggingFace Space]&#40;https://huggingface.co/spaces/KwaiVGI/liveportrait&#41;, thanks to the HF team and [Gradio]&#40;https://github.com/gradio-app/gradio&#41;!&#41;)
35
+ [//]: # ([//]: # &#40;Continuous updates, stay tuned!&#41;)
36
+ [//]: # (- **`2024/08/30`**: 😊 We released the initial version of the inference code.)
37
+
38
+ [//]: # (- **`2024/08/30`**: 😊 We released the technical report on [arXiv]&#40;https://arxiv.org/pdf/2408.16766&#41;)
39
+
40
+ [//]: # (- **`2024/07/15`**: 🔥 We released the [homepage]&#40;https://csgo-gen.github.io&#41;.)
41
+
42
+ [//]: # ()
43
+ [//]: # (## Plan 💪)
44
+
45
+ [//]: # (- [x] technical report)
46
+
47
+ [//]: # (- [x] inference code)
48
+
49
+ [//]: # (- [ ] pre-trained weight)
50
+
51
+ [//]: # (- [ ] IMAGStyle dataset)
52
+
53
+ [//]: # (- [ ] training code)
54
+
55
+ ## Introduction 📖
56
+ This repo, named **CSGO**, contains the official PyTorch implementation of our paper [CSGO: Content-Style Composition in Text-to-Image Generation](https://arxiv.org/pdf/).
57
+ We are actively updating and improving this repository. If you find any bugs or have suggestions, welcome to raise issues or submit pull requests (PR) 💖.
58
+
59
+ ## Detail ✨
60
+ We currently release two model weights.
61
+
62
+ | Mode | content token | style token | Other |
63
+ |:------------:|:-----------:|:-----------:|:---------------:|
64
+ |csgo.bin|4|16| - |
65
+ |csgo_4_32.bin|4|32| Deepspeed zero2 |
66
+
67
+
68
+ ## Pipeline 💻
69
+ <p align="center">
70
+ <img src="assets/image3_1.jpg">
71
+ </p>
72
+
73
+ ## Capabilities 🚅
74
+
75
+ 🔥 Our CSGO achieves **image-driven style transfer, text-driven stylized synthesis, and text editing-driven stylized synthesis**.
76
+
77
+ 🔥 For more results, visit our <a href="https://csgo-gen.github.io"><strong>homepage</strong></a> 🔥
78
+
79
+ <p align="center">
80
+ <img src="assets/vis.jpg">
81
+ </p>
82
+
83
+
84
+ ## Getting Started 🏁
85
+ ### 1. Clone the code and prepare the environment
86
+ ```bash
87
+ git clone https://github.com/instantX-research/CSGO
88
+ cd CSGO
89
+
90
+ # create env using conda
91
+ conda create -n CSGO python=3.9
92
+ conda activate CSGO
93
+
94
+ # install dependencies with pip
95
+ # for Linux and Windows users
96
+ pip install -r requirements.txt
97
+ ```
98
+
99
+ ### 2. Download pretrained weights(coming soon)
100
+
101
+ The easiest way to download the pretrained weights is from HuggingFace:
102
+ ```bash
103
+ # first, ensure git-lfs is installed, see: https://docs.github.com/en/repositories/working-with-files/managing-large-files/installing-git-large-file-storage
104
+ git lfs install
105
+ # clone and move the weights
106
+ git clone https://huggingface.co/InstantX/CSGO CSGO
107
+ ```
108
+ Our method is fully compatible with [SDXL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0), [VAE](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix), [ControlNet](https://huggingface.co/TTPlanet/TTPLanet_SDXL_Controlnet_Tile_Realistic), and [Image Encoder](https://huggingface.co/h94/IP-Adapter/tree/main/sdxl_models/image_encoder).
109
+ Please download them and place them in the ./base_models folder.
110
+
111
+ tips:If you expect to load Controlnet directly using ControlNetPipeline as in CSGO, do the following:
112
+ ```bash
113
+ git clone https://huggingface.co/TTPlanet/TTPLanet_SDXL_Controlnet_Tile_Realistic
114
+ mv TTPLanet_SDXL_Controlnet_Tile_Realistic/TTPLANET_Controlnet_Tile_realistic_v2_fp16.safetensors TTPLanet_SDXL_Controlnet_Tile_Realistic/diffusion_pytorch_model.safetensors
115
+ ```
116
+ ### 3. Inference 🚀
117
+
118
+ ```python
119
+ import torch
120
+ from ip_adapter.utils import resize_content
121
+ import numpy as np
122
+ from ip_adapter.utils import BLOCKS as BLOCKS
123
+ from ip_adapter.utils import controlnet_BLOCKS as controlnet_BLOCKS
124
+ from PIL import Image
125
+ from diffusers import (
126
+ AutoencoderKL,
127
+ ControlNetModel,
128
+ StableDiffusionXLControlNetPipeline,
129
+
130
+ )
131
+ from ip_adapter import CSGO
132
+
133
+
134
+ device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
135
+
136
+ base_model_path = "./base_models/stable-diffusion-xl-base-1.0"
137
+ image_encoder_path = "./base_models/IP-Adapter/sdxl_models/image_encoder"
138
+ csgo_ckpt = "./CSGO/csgo.bin"
139
+ pretrained_vae_name_or_path ='./base_models/sdxl-vae-fp16-fix'
140
+ controlnet_path = "./base_models/TTPLanet_SDXL_Controlnet_Tile_Realistic"
141
+ weight_dtype = torch.float16
142
+
143
+
144
+ vae = AutoencoderKL.from_pretrained(pretrained_vae_name_or_path,torch_dtype=torch.float16)
145
+ controlnet = ControlNetModel.from_pretrained(controlnet_path, torch_dtype=torch.float16,use_safetensors=True)
146
+ pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
147
+ base_model_path,
148
+ controlnet=controlnet,
149
+ torch_dtype=torch.float16,
150
+ add_watermarker=False,
151
+ vae=vae
152
+ )
153
+ pipe.enable_vae_tiling()
154
+
155
+
156
+ target_content_blocks = BLOCKS['content']
157
+ target_style_blocks = BLOCKS['style']
158
+ controlnet_target_content_blocks = controlnet_BLOCKS['content']
159
+ controlnet_target_style_blocks = controlnet_BLOCKS['style']
160
+
161
+ csgo = CSGO(pipe, image_encoder_path, csgo_ckpt, device, num_content_tokens=4,num_style_tokens=32,
162
+ target_content_blocks=target_content_blocks, target_style_blocks=target_style_blocks,controlnet_adapter=True,
163
+ controlnet_target_content_blocks=controlnet_target_content_blocks,
164
+ controlnet_target_style_blocks=controlnet_target_style_blocks,
165
+ content_model_resampler=True,
166
+ style_model_resampler=True,
167
+
168
+ )
169
+
170
+ style_name = 'img_1.png'
171
+ content_name = 'img_0.png'
172
+ style_image = Image.open("../assets/{}".format(style_name)).convert('RGB')
173
+ content_image = Image.open('../assets/{}'.format(content_name)).convert('RGB')
174
+
175
+ caption ='a small house with a sheep statue on top of it'
176
+
177
+ num_sample=4
178
+
179
+ #image-driven style transfer
180
+ images = csgo.generate(pil_content_image= content_image, pil_style_image=style_image,
181
+ prompt=caption,
182
+ negative_prompt= "text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
183
+ content_scale=1.0,
184
+ style_scale=1.0,
185
+ guidance_scale=10,
186
+ num_images_per_prompt=num_sample,
187
+ num_samples=1,
188
+ num_inference_steps=50,
189
+ seed=42,
190
+ image=content_image.convert('RGB'),
191
+ controlnet_conditioning_scale=0.6,
192
+ )
193
+
194
+ #text editing-driven stylized synthesis
195
+ caption='a small house'
196
+ images = csgo.generate(pil_content_image= content_image, pil_style_image=style_image,
197
+ prompt=caption,
198
+ negative_prompt= "text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
199
+ content_scale=1.0,
200
+ style_scale=1.0,
201
+ guidance_scale=10,
202
+ num_images_per_prompt=num_sample,
203
+ num_samples=1,
204
+ num_inference_steps=50,
205
+ seed=42,
206
+ image=content_image.convert('RGB'),
207
+ controlnet_conditioning_scale=0.4,
208
+ )
209
+
210
+ #text-driven stylized synthesis
211
+ caption='a cat'
212
+ #If the content image still interferes with the generated results, set the content image to an empty image.
213
+ # content_image =Image.fromarray(np.zeros((content_image.size[0],content_image.size[1], 3), dtype=np.uint8)).convert('RGB')
214
+
215
+ images = csgo.generate(pil_content_image= content_image, pil_style_image=style_image,
216
+ prompt=caption,
217
+ negative_prompt= "text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
218
+ content_scale=1.0,
219
+ style_scale=1.0,
220
+ guidance_scale=10,
221
+ num_images_per_prompt=num_sample,
222
+ num_samples=1,
223
+ num_inference_steps=50,
224
+ seed=42,
225
+ image=content_image.convert('RGB'),
226
+ controlnet_conditioning_scale=0.01,
227
+ )
228
+ ```
229
+
230
+ ## Demos
231
+ <p align="center">
232
+ <br>
233
+ 🔥 For more results, visit our <a href="https://csgo-gen.github.io"><strong>homepage</strong></a> 🔥
234
+ </p>
235
+
236
+ ### Content-Style Composition
237
+ <p align="center">
238
+ <img src="assets/page1.png">
239
+ </p>
240
+
241
+ <p align="center">
242
+ <img src="assets/page4.png">
243
+ </p>
244
+
245
+ ### Cycle Translation
246
+ <p align="center">
247
+ <img src="assets/page8.png">
248
+ </p>
249
+
250
+ ### Text-Driven Style Synthesis
251
+ <p align="center">
252
+ <img src="assets/page10.png">
253
+ </p>
254
+
255
+ ### Text Editing-Driven Style Synthesis
256
+ <p align="center">
257
+ <img src="assets/page11.jpg">
258
+ </p>
259
+
260
+ ## Star History
261
+ [![Star History Chart](https://api.star-history.com/svg?repos=instantX-research/CSGO&type=Date)](https://star-history.com/#instantX-research/CSGO&Date)
262
+
263
+
264
+
265
+ ## Acknowledgements
266
+ This project is developed by InstantX Team, all copyright reserved.
267
+
268
+ ## Citation 💖
269
+ If you find CSGO useful for your research, welcome to 🌟 this repo and cite our work using the following BibTeX:
270
+ ```bibtex
271
+ @article{xing2024csgo,
272
+ title={CSGO: Content-Style Composition in Text-to-Image Generation},
273
+ author={Peng Xing and Haofan Wang and Yanpeng Sun and Qixun Wang and Xu Bai and Hao Ai and Renyuan Huang and Zechao Li},
274
+ year={2024},
275
+ journal = {arXiv 2408.16766},
276
+ }
277
  ```