File size: 4,698 Bytes
e866ec0
 
 
 
 
 
 
 
 
 
 
 
 
e794ac7
 
 
93253be
e794ac7
f286586
e794ac7
f286586
e794ac7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f286586
e794ac7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e866ec0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
---
license: mit
language:
- en
base_model:
- black-forest-labs/FLUX.1-dev
- stabilityai/stable-diffusion-3.5-medium
- stabilityai/stable-diffusion-3-medium
library_name: diffusers
tags:
- Multimodal-Image-Generation
- Image-Generation
---
# UNIC-Adapter: Unified Image-Instruction Adapter for Multimodal Image Generation
[![arXiv](https://img.shields.io/badge/arXiv-Paper-A42C25.svg)](https://arxiv.org/abs/2412.18928)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Github](https://img.shields.io/badge/GitHub-AIDC--AI/UNIC--Adapter-blue?style=flat&logo=github)](https://github.com/AIDC-AI/UNIC-Adapter)  

UNIC-Adapter is a unified image-instruction adapter that integrates multimodal instructions for controllable image generation. This model card hosts the official models for the CVPR 2025 paper "UNIC-Adapter: Unified Image-instruction Adapter with Multi-modal Transformer for Image Generation".

On this model card, we release a model based on SD3 Medium, which supports the tasks described in our paper. In addition, we also provide two additional models: one built on SD3.5 Medium, which is capable of traditional computer vision perception tasks, and another on FLUX.1-dev, which supports both instruction-based image editing and traditional computer vision perception tasks.

## Generated samples

### Pixel-level Control
*(Left: Condition image, Center left: SD3 Medium with UNIC-Adapter, Center right: SD3.5 Medium with UNIC-Adapter, Right: FLUX.1-dev with UNIC-Adapter)*

<img src='./examples/example_0.png' width='100%'/>

### Subject-driven Generation
*(Left: Condition image, Center left: SD3 Medium with UNIC-Adapter, Center right: SD3.5 Medium with UNIC-Adapter, Right: FLUX.1-dev with UNIC-Adapter)*

<img src='./examples/example_1.png' width='100%'/>

<img src='./examples/example_2.png' width='100%'/>

*(Left: condition image, Center: SD 3.5 Medium with UNIC-Adapter, Right: FLUX.1-dev with UNIC-Adapter)*

<img src='./examples/example_3.png' width='75%'/>

<img src='./examples/example_4.png' width='75%'/>

### Style-driven Generation
*(Left: Condition image, Center left: SD3 Medium with UNIC-Adapter, Center right: SD3.5 Medium with UNIC-Adapter, Right: FLUX.1-dev with UNIC-Adapter)*

<img src='./examples/example_5.png' width='100%'/>

<img src='./examples/example_6.png' width='100%'/>

### Image Understanding 
*(Left: Source image, Center: SD3.5 Medium with UNIC-Adapter, Right: FLUX.1-dev with UNIC-Adapter)*

<img src='./examples/example_7.png' width='75%'/>


### Image Editing
*(Left: Source image, Right: FLUX.1-dev with UNIC-Adapter)*

<img src='./examples/example_8.png' width='50%'/>
<img src='./examples/example_9.png' width='50%'/>

## License
This project is licensed under the MIT License (SPDX-License-Identifier: MIT). 
The models cannot be used independently. If you use our model in conjunction with the Flux model, you must review the [FLUX.1 [dev] Non-Commercial License](https://github.com/black-forest-labs/flux/blob/main/model_licenses/LICENSE-FLUX1-dev) of the Flux model and comply with all of its terms; If you use our model in conjunction with the stable-diffusion-3-medium model, then you must review the [STABILITY AI COMMUNITY LICENSE AGREEMENT](https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/LICENSE.md) of the SD3 model and comply with all of its terms; If you use our model in conjunction with the stable-diffusion-3.5-medium model, then you must review the [STABILITY AI COMMUNITY LICENSE AGREEMENT](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium/blob/main/LICENSE.md) of the SD3.5 model and comply with all of its terms.

## Citation
If you find this repo is helpful for your research, please cite our paper:
```bibtex
@inproceedings{duan2025unic,
  title={UNIC-Adapter: Unified Image-instruction Adapter with Multi-modal Transformer for Image Generation},
  author={Duan, Lunhao and Zhao, Shanshan and Yan, Wenjun and Li, Yinglun and Chen, Qing-Guo and Xu, Zhao and Luo, Weihua and Zhang, Kaifu and Gong, Mingming and Xia, Gui-Song},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={7963--7973},
  year={2025}
}
```

## Disclaimer
We used compliance checking algorithms during the training process, to ensure the compliance of the trained model(s) to the best of our ability. Due to complex data and the diversity of language model usage scenarios, we cannot guarantee that the model is completely free of copyright issues or improper content. If you believe anything infringes on your rights or generates improper content, please contact us, and we will promptly address the matter.