File size: 2,113 Bytes
43dfc4c
de9cf4e
 
 
43dfc4c
de9cf4e
 
 
43dfc4c
de9cf4e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
---

tags:
- text-to-image
- stable-diffusion
license: apache-2.0
language:
- en
library_name: diffusers
---


# EasyRef Model Card

<div align="center">

[**Project Page**](https://easyref-gen.github.io/) **|** [**Paper**](https://arxiv.org/pdf/2412.09618) **|** [**Code**](https://github.com/TempleX98/EasyRef) **|** [🤗 **Demo**](https://huggingface.co/spaces/zongzhuofan/EasyRef)


</div>

## Introduction

EasyRef is capable of modeling the consistent visual elements of various group image references with a single generalist multimodal LLM in a zero-shot setting.

<div  align="center">
<img src='examples/framework.png'>
</div>

## Demos
More visualization examples are available in our [project page](https://easyref-gen.github.io/).
### Style, Identity, and Character Preservation
<img src='examples/teaser.png'>

### Comparison with IP-Adapter

<img src='examples/qualitative.png'>

### Compatibility with ControlNet

<img src='examples/controlnet.png'>

## Inference
We provide the inference code of EasyRef with SDXL in [**easyref_demo**](https://github.com/TempleX98/EasyRef/blob/main/easyref_demo.ipynb).

### Usage Tips
- EasyRef performs best when provided with multiple reference images (more than 2).
- To ensure better identity preservation, we strongly recommend that users upload multiple square face images, ensuring the face occupies the majority of each image.
- Using multimodal prompts (both reference images and non-empty text prompt) can achieve better results. 
- We set `scale=1.0` by default. Lowering the `scale` value leads to more diverse but less consistent generation results.

## Cite
If you find EasyRef useful for your research and applications, please cite us using this BibTeX:

```bibtex

@article{easyref,

  title={EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM},

  author={Zong, Zhuofan and Jiang, Dongzhi and Ma, Bingqi and Song, Guanglu and Shao, Hao and Shen, Dazhong and Liu, Yu and Li, Hongsheng},

  journal={arXiv preprint arXiv:2412.09618},  

  year={2024}

}

```