wanghaofan commited on
Commit
8621341
·
verified ·
1 Parent(s): 0323578

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +115 -3
README.md CHANGED
@@ -1,3 +1,115 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ library_name: diffusers
6
+ pipeline_tag: image-to-image
7
+ tags:
8
+ - Image-to-Image
9
+ - ControlNet
10
+ - Diffusers
11
+ - QwenImageControlNetPipeline
12
+ - Qwen-Image
13
+ base_model: Qwen/Qwen-Image
14
+ ---
15
+
16
+ # Qwen-Image-ControlNet-Union
17
+ This repository provides a unified ControlNet that supports 4 control types (canny, soft edge, depth, pose) for [Qwen-Image](https://huggingface.co/Qwen/Qwen-Image).
18
+
19
+
20
+ # Model Cards
21
+ - This ControlNet consists of 5 double blocks copied from the pretrained transformer layers.
22
+ - We train the model from scratch for 50K steps using a dataset of 10M high-quality general and human images.
23
+ - We train at 1328x1328 resolution in BFloat16, batch size=64, learning rate=4e-5. We set the text drop ratio to 0.10.
24
+ - This model supports multiple control modes, including canny, soft edge, depth, pose. You can use it just as a normal ControlNet.
25
+
26
+ # Showcases
27
+ <table style="width:100%; table-layout:fixed;">
28
+ <tr>
29
+ <td><img src="./conds/canny2.png" alt="canny"></td>
30
+ <td><img src="./outputs/canny2.png" alt="softedge"></td>
31
+ </tr>
32
+ <tr>
33
+ <td><img src="./conds/soft_edge.png" alt="pose"></td>
34
+ <td><img src="./outputs/soft_edge.png" alt="depth"></td>
35
+ </tr>
36
+ <tr>
37
+ <td><img src="./conds/depth.png" alt="pose"></td>
38
+ <td><img src="./outputs/depth.png" alt="depth"></td>
39
+ </tr>
40
+ <tr>
41
+ <td><img src="./conds/pose.png" alt="pose"></td>
42
+ <td><img src="./outputs/pose.png" alt="depth"></td>
43
+ </tr>
44
+ </table>
45
+
46
+ # Inference
47
+ ```python
48
+ import torch
49
+ from diffusers.utils import load_image
50
+
51
+ # before merging, please import via local path
52
+ from controlnet_qwenimage import QwenImageControlNetModel
53
+ from transformer_qwenimage import QwenImageTransformer2DModel
54
+ from pipeline_qwenimage_controlnet import QwenImageControlNetPipeline
55
+
56
+ base_model = "Qwen/Qwen-Image"
57
+ controlnet_model = "InstantX/Qwen-Image-ControlNet-Union"
58
+
59
+ controlnet = QwenImageControlNetModel.from_pretrained(controlnet_model, torch_dtype=torch.bfloat16)
60
+ transformer = QwenImageTransformer2DModel.from_pretrained(base_model, subfolder="transformer", torch_dtype=torch.bfloat16)
61
+
62
+ pipe = QwenImageControlNetPipeline.from_pretrained(
63
+ base_model, controlnet=controlnet, transformer=transformer, torch_dtype=torch.bfloat16
64
+ )
65
+ pipe.to("cuda")
66
+
67
+ # canny
68
+ # it is highly suggested to add 'TEXT' into prompt if there are text elements
69
+ control_image = load_image("conds/canny.png")
70
+ prompt = "Aesthetics art, traditional asian pagoda, elaborate golden accents, sky blue and white color palette, swirling cloud pattern, digital illustration, east asian architecture, ornamental rooftop, intricate detailing on building, cultural representation."
71
+ controlnet_conditioning_scale = 1.0
72
+
73
+ # soft edge, recommended scale: 0.8 - 1.0
74
+ # control_image = load_image("conds/soft_edge.png")
75
+ # prompt = "Photograph of a young man with light brown hair jumping mid-air off a large, reddish-brown rock. He's wearing a navy blue sweater, light blue shirt, gray pants, and brown shoes. His arms are outstretched, and he has a slight smile on his face. The background features a cloudy sky and a distant, leafless tree line. The grass around the rock is patchy."
76
+ # controlnet_conditioning_scale = 0.9
77
+
78
+ # depth
79
+ # control_image = load_image("conds/depth.png")
80
+ # prompt = "A swanky, minimalist living room with a huge floor-to-ceiling window letting in loads of natural light. A beige couch with white cushions sits on a wooden floor, with a matching coffee table in front. The walls are a soft, warm beige, decorated with two framed botanical prints. A potted plant chills in the corner near the window. Sunlight pours through the leaves outside, casting cool shadows on the floor."
81
+ # controlnet_conditioning_scale = 0.9
82
+
83
+ # pose
84
+ # control_image = load_image("conds/pose.png")
85
+ # prompt = "Photograph of a young man with light brown hair and a beard, wearing a beige flat cap, black leather jacket, gray shirt, brown pants, and white sneakers. He's sitting on a concrete ledge in front of a large circular window, with a cityscape reflected in the glass. The wall is cream-colored, and the sky is clear blue. His shadow is cast on the wall."
86
+ # controlnet_conditioning_scale = 1.0
87
+
88
+ image = pipe(
89
+ prompt=prompt,
90
+ negative_prompt=" ",
91
+ control_image=control_image,
92
+ controlnet_conditioning_scale=controlnet_conditioning_scale,
93
+ width=control_image.size[0],
94
+ height=control_image.size[1],
95
+ num_inference_steps=30,
96
+ true_cfg_scale=4.0,
97
+ generator=torch.Generator(device="cuda").manual_seed(42),
98
+ ).images[0]
99
+ image.save(f"qwenimage_cn_union_result.png")
100
+ ```
101
+
102
+ # Recommended Parameters
103
+ You can adjust control strength via controlnet_conditioning_scale.
104
+ - Canny: use cv2.Canny, set controlnet_conditioning_scale in [0.8, 1.0]
105
+ - Soft Edge: use [AnylineDetector](https://github.com/huggingface/controlnet_aux), set controlnet_conditioning_scale in [0.8, 1.0]
106
+ - Depth: use [depth-anything](https://github.com/DepthAnything/Depth-Anything-V2), set controlnet_conditioning_scale in [0.8, 1.0]
107
+ - Pose: use [DWPose](https://github.com/IDEA-Research/DWPose/tree/onnx), set controlnet_conditioning_scale in [0.8, 1.0]
108
+
109
+ We strongly recommend using detailed prompts, especially when include text elements. For example, use prompt "A poster with a wilderness scene in the background. In the lower right corner, it says 'InstantX Team. All copyright reserved.'' The headlines are 'Qwen-Image' and 'ControlNet-Union', and the date is '2025.8'." instead of "a poster".
110
+
111
+ # Limitations
112
+ We find that the model was unable to preserve some details, such as small font text.
113
+
114
+ # Acknowledgements
115
+ This model is developed by InstantX Team. All copyright reserved.