File size: 11,930 Bytes
34cb4f8
 
 
 
 
 
 
a5e0d59
34cb4f8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ef707c6
34cb4f8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ef707c6
34cb4f8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ef707c6
34cb4f8
 
 
 
 
 
ef707c6
34cb4f8
 
 
 
 
 
ef707c6
34cb4f8
 
 
 
 
 
ef707c6
34cb4f8
 
 
 
 
 
ef707c6
34cb4f8
 
 
 
 
 
ef707c6
34cb4f8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ef707c6
34cb4f8
 
 
 
 
 
 
 
ef707c6
34cb4f8
 
 
 
 
 
ef707c6
34cb4f8
 
 
 
 
 
 
ef707c6
34cb4f8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ef707c6
34cb4f8
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
<div style="display: flex; justify-content: center; align-items: center;">
  <img src="./images/images_alibaba.png" alt="alibaba" style="width: 20%; height: auto; margin-right: 5%;">
  <img src="./images/images_alimama.png" alt="alimama" style="width: 20%; height: auto;">
</div>

EcomID aims to generate customized images from a single reference ID image, ensuring strong semantic consistency while being controlled by keypoints.

This repository provides the EcomID method and model, combining the strengths of [PuLID](https://github.com/ToTheBeginning/PuLID) and [InstantID](https://github.com/instantX-research/InstantID) for better background consistency, facial keypoint control, and realistic facial representation with improved similarity.

# EcomID Overview

## EcomID Structure
  <img src="./images/overflow.png" alt="alibaba" style="width: 100%; height: auto; margin-right: 5%;">


- **IP-Adapter of PuLID**: EcomID incorporates the ID-Encoder and cross-attention components from PuLID, trained with alignment loss. 
This method effectively reduces the interference of ID embeddings on text embeddings within the cross-attention part, minimizing disruption to the underlying model's text-to-image capabilities.
- **InstantID’s IdentityNet Architecture**: Utilizing **a dataset of 2 million aesthetically pleasing portrait images**, IdentityNet enhances keypoint control, improving ID consistency and facial realism. During training, the IP-adapter is frozen, and only the IdentityNet is trained. Facial landmarks are used as conditional inputs, while face embeddings are integrated into IdentityNet via cross-attention.

# Show Cases
## Comparison with Other Methods
### 1、Preserved Text-to-Image Capability

<table>
    <tr>
        <th style="width: 28%;">Prompt</th>
        <th style="width: 24%;">Reference Image</th>
        <th style="width: 24%;">EcomID</th>
        <th style="width: 24%;">InstantID</th>
    </tr>
    <tr>
        <td style="font-size: 12px;">girl, white skin, black hair, long wavy hair, <span style="color:red"><strong>in European style living room, Retro tone, decorations</strong></span>, depth of field.</td>
        <td><img src="images/show_case/50.png" alt="参考图像" width="100%"></td>
        <td><img src="images/show_case/49.png" alt="EcomID图像" width="100%"></td>
        <td><img src="images/show_case/48.png" alt="InstantID图像" width="100%"></td>
    </tr>
<table>

As shown above, EcomID ***preserves background generation abilities while minimizing stylization, greatly enhancing realism***. 
The visualizations highlight more authentic portraits with improved background semantic consistency, showcasing EcomID's advantage in generating realistic images.

### 2、Improved Facial Control and Consistency
<table>
    <tr>
        <th style="width: 24%;">Prompt</th>
        <th style="width: 19%;">Reference Image</th>
        <th style="width: 19%;">EcomID</th>
        <th style="width: 19%;">InstantID</th>
        <th style="width: 19%;">PuLID</th>
    </tr>
    <tr>
        <td style="font-size: 12px;">A close-up portrait of a man standing in the library, holding <span style="color:red"><strong>two smiling toddlers</strong></span> next to him.</td>
        <td><img src="images/show_case/20.png" alt="参考图像" width="100%"></td>
        <td><img src="images/show_case/17.png" alt="EcomID图像" width="100%"></td>
        <td><img src="images/show_case/18.png" alt="InstantID图像" width="100%"></td>
        <td><img src="images/show_case/19.png" alt="PuLID图像" width="100%"></td>
    </tr>
<table>

As shown above, EcomID employs keypoints as conditional inputs for training, ***allowing for precise adjustments of facial positions, sizes, and orientations***. This capability ensures that the generated portraits are more controllable while further enhancing facial similarity and the overall quality of the images.

### More showcases
EcomID enhances portrait representation, delivering a more authentic and aesthetically pleasing appearance while ensuring semantic consistency and greater internal ID similarity (i.e., traits that do not vary with age, hairstyle, glasses, or other physical changes).

<table>
    <tr>
        <th style="width: 24%;">Prompt</th>
        <th style="width: 19%;">Reference Image</th>
        <th style="width: 19%;">EcomID</th>
        <th style="width: 19%;">InstantID</th>
        <th style="width: 19%;">PuLID</th>
    </tr>
    <tr>
        <td style="font-size: 12px;">A close-up portrait of a <span style="color:red"><strong>little girl with double braids</strong></span>, wearing a white dress, standing on the beach during sunset.</td>
        <td><img src="images/show_case/21.png" alt="参考图像" width="100%"></td>
        <td><img src="images/show_case/22.png" alt="EcomID图像" width="100%"></td>
        <td><img src="images/show_case/23.png" alt="InstantID图像" width="100%"></td>
        <td><img src="images/show_case/24.png" alt="PuLID图像" width="100%"></td>
    </tr>
    <tr>
        <td style="font-size: 12px;">A close-up portrait of a <span style="color:red"><strong>very little girl</strong></span> with double braids, wearing <span style="color:red"><strong>a hat</strong></span> and white dress, standing on the beach during sunset.</td>
        <td><img src="images/show_case/44.png" alt="参考图像" width="100%"></td>
        <td><img src="images/show_case/47.png" alt="EcomID图像" width="100%"></td>
        <td><img src="images/show_case/46.png" alt="InstantID图像" width="100%"></td>
        <td><img src="images/show_case/45.png" alt="PuLID图像" width="100%"></td>
    </tr>
    <tr>
        <td style="font-size: 12px;">Agrizzled detective, <span style="color:red"><strong>fedora</strong></span> casting a shadow over his square jaw, a <span style="color:red"><strong>cigar dangling from his lips</strong></span>, his trench coat evocative of film noir, in a <span style="color:red"><strong>rainy alley</strong></span>.</td>
        <td><img src="images/show_case/25.png" alt="参考图像" width="100%"></td>
        <td><img src="images/show_case/26.png" alt="EcomID图像" width="100%"></td>
        <td><img src="images/show_case/27.png" alt="InstantID图像" width="100%"></td>
        <td><img src="images/show_case/28.png" alt="PuLID图像" width="100%"></td>
    </tr>
    <tr>
        <td style="font-size: 12px;">A smiling girl with <span style="color:red"><strong>bangs and long hair</strong></span> in a school uniform stands under cherry trees, holding a book.</td>
        <td><img src="images/show_case/29.png" alt="参考图像" width="100%"></td>
        <td><img src="images/show_case/30.png" alt="EcomID图像" width="100%"></td>
        <td><img src="images/show_case/31.png" alt="InstantID图像" width="100%"></td>
        <td><img src="images/show_case/32.png" alt="PuLID图像" width="100%"></td>
    </tr>
    <tr>
        <td style="font-size: 12px;">A <span style="color:red"><strong>very old</strong></span> witch, wearing a black cloak, with a pointed hat, holding a magic wand, against a background of a misty forest.</td>
        <td><img src="images/show_case/33.png" alt="参考图像" width="100%"></td>
        <td><img src="images/show_case/34.png" alt="EcomID图像" width="100%"></td>
        <td><img src="images/show_case/35.png" alt="InstantID图像" width="100%"></td>
        <td><img src="images/show_case/36.png" alt="PuLID图像" width="100%"></td>
    </tr>
    <tr>
        <td style="font-size: 12px;">A man clad in cyberpunk fashion: <span style="color:red"><strong>neon accents, reflective sunglasses,</strong></span> and a leather jacket with glowing circuit patterns. He stands stoically amidst a soaked cityscape.</td>
        <td><img src="images/show_case/37.png" alt="参考图像" width="100%"></td>
        <td><img src="images/show_case/38.png" alt="EcomID图像" width="100%"></td>
        <td><img src="images/show_case/39.png" alt="InstantID图像" width="100%"></td>
        <td><img src="images/show_case/40.png" alt="PuLID图像" width="100%"></td>
    </tr>

</table>

### More Base Models, Resolutions, and Styles
<table>
    <tr>
        <th style="width: 12%;">SDXL models</th>
        <th style="width: 24%;">Prompt</th>
        <th style="width: 16%;">Reference Image</th>
        <th style="width: 16%;">EcomID</th>
        <th style="width: 16%;">InstantID</th>
        <th style="width: 16%;">PuLID</th>
    </tr>
    <tr>
        <td>sd-xl-base-1.0</td>
        <td style="font-size: 12px;">girl, solo, brown hair, holding a little teddy bear on her hands, wearing a school uniform, standing in the library, <span style="color:red"><strong>cartoon style</strong></span>.</td>
        <td><img src="images/show_case/1.png" alt="参考图像" width="100%"></td>
        <td><img src="images/show_case/2.png" alt="EcomID图像" width="100%"></td>
        <td><img src="images/show_case/3.png" alt="InstantID图像" width="100%"></td>
        <td><img src="images/show_case/4.png" alt="PuLID图像" width="100%"></td>
    </tr>
    <tr>
        <td>EcomXL</td>
        <td>A close-up portrait of a <span style="color:red"><strong>very little girl</strong></span> with double braids, wearing <span style="color:red"><strong>a hat</strong></span> and white dress, standing on the beach during sunset.</td>
        <td style="font-size: 12px;"><img src="images/show_case/44.png" alt="参考图像" width="100%"></td>
        <td><img src="images/show_case/47.png" alt="EcomID图像" width="100%"></td>
        <td><img src="images/show_case/46.png" alt="InstantID图像" width="100%"></td>
        <td><img src="images/show_case/45.png" alt="PuLID图像" width="100%"></td>
    </tr>
    <tr>
        <td>DreamShaperXL</td>
        <td style="font-size: 12px;">solo, looking_at_viewer, smile, brown_hair, upper_body, open_clothes, teeth, open_jacket, black_jacket, blurry_background, realistic</td>
        <td><img src="images/show_case/44.png" alt="参考图像" width="100%"></td>
        <td><img src="images/show_case/6.png" alt="EcomID图像" width="100%"></td>
        <td><img src="images/show_case/7.png" alt="InstantID图像" width="100%"></td>
        <td><img src="images/show_case/8.png" alt="PuLID图像" width="100%"></td>
    </tr>
    <tr>
        <td>leosam_xl_v7</td>
        <td style="font-size: 12px;">A close-up portrait of a girl, solo, dress, jewelry, beach and sea, pink_dress, realistic.</td>
        <td><img src="images/show_case/9.png" alt="参考图像" width="100%"></td>
        <td><img src="images/show_case/15.png" alt="EcomID图像" width="100%"></td>
        <td><img src="images/show_case/14.png" alt="InstantID图像" width="100%"></td>
        <td><img src="images/show_case/16.png" alt="PuLID图像" width="100%"></td>
    </tr>
</table>

### Notes
- Unless otherwise specified, the showcases are generated using the base model EcomXL, which is also highly compatible with various other SDXL-based models, such as [leosams-helloworld-xl](https://civitai.com/models/43977/leosams-helloworld-xl), [dreamshaper-xl](https://civitai.com/models/112902/dreamshaper-xl), [stable-diffusion-xl-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) and so on.
- It works very well with SDXL Turbo/Lighting, [EcomXL Inpainting ControlNet](https://huggingface.co/alimama-creative/EcomXL_controlnet_inpaint) and [EcomXL Softedge ControlNet](https://huggingface.co/alimama-creative/EcomXL_controlnet_softedge).

# How to use

## ComfyUI

- The EcomID_ComfyUI node has been released: [click here](https://code.alibaba-inc.com/ruxue.wrx/EcomID_ComfyUI)

# Training Details

The model is trained on 2M Taobao images, where the proportion of human faces is greater than 3%. The images have a resolution greater than 800, and the aesthetic score is above 5.5.

Mixed precision: fp16

Learning rate: 1e-4

Batch size: 2

Image size: 1024x1024