InstantX
/

Qwen-Image-ControlNet-Union

@@ -14,7 +14,7 @@ base_model: Qwen/Qwen-Image
 ---
 # Qwen-Image-ControlNet-Union
-This repository provides a unified ControlNet that supports 4 control types (canny, soft edge, depth, pose) for [Qwen-Image](https://huggingface.co/Qwen/Qwen-Image).
 # Model Cards
@@ -48,19 +48,17 @@ This repository provides a unified ControlNet that supports 4 control types (can
 import torch
 from diffusers.utils import load_image
-# before merging, please import via local path
-from controlnet_qwenimage import QwenImageControlNetModel
-from transformer_qwenimage import QwenImageTransformer2DModel
-from pipeline_qwenimage_controlnet import QwenImageControlNetPipeline
 base_model = "Qwen/Qwen-Image"
 controlnet_model = "InstantX/Qwen-Image-ControlNet-Union"
 controlnet = QwenImageControlNetModel.from_pretrained(controlnet_model, torch_dtype=torch.bfloat16)
-transformer = QwenImageTransformer2DModel.from_pretrained(base_model, subfolder="transformer", torch_dtype=torch.bfloat16)
 pipe = QwenImageControlNetPipeline.from_pretrained(
-    base_model, controlnet=controlnet, transformer=transformer, torch_dtype=torch.bfloat16
 )
 pipe.to("cuda")
@@ -70,15 +68,15 @@ control_image = load_image("conds/canny.png")
 prompt = "Aesthetics art, traditional asian pagoda, elaborate golden accents, sky blue and white color palette, swirling cloud pattern, digital illustration, east asian architecture, ornamental rooftop, intricate detailing on building, cultural representation."
 controlnet_conditioning_scale = 1.0
-# soft edge, recommended scale: 0.8 - 1.0
 # control_image = load_image("conds/soft_edge.png")
 # prompt = "Photograph of a young man with light brown hair jumping mid-air off a large, reddish-brown rock. He's wearing a navy blue sweater, light blue shirt, gray pants, and brown shoes. His arms are outstretched, and he has a slight smile on his face. The background features a cloudy sky and a distant, leafless tree line. The grass around the rock is patchy."
-# controlnet_conditioning_scale = 0.9
 # depth
 # control_image = load_image("conds/depth.png")
 # prompt = "A swanky, minimalist living room with a huge floor-to-ceiling window letting in loads of natural light. A beige couch with white cushions sits on a wooden floor, with a matching coffee table in front. The walls are a soft, warm beige, decorated with two framed botanical prints. A potted plant chills in the corner near the window. Sunlight pours through the leaves outside, casting cool shadows on the floor."
-# controlnet_conditioning_scale = 0.9
 # pose
 # control_image = load_image("conds/pose.png")
@@ -99,7 +97,7 @@ image = pipe(
 image.save(f"qwenimage_cn_union_result.png")
 ```
-# Recommended Parameters
 You can adjust control strength via controlnet_conditioning_scale.
 - Canny: use cv2.Canny, set controlnet_conditioning_scale in [0.8, 1.0]
 - Soft Edge: use [AnylineDetector](https://github.com/huggingface/controlnet_aux), set controlnet_conditioning_scale in [0.8, 1.0]
@@ -108,11 +106,16 @@ You can adjust control strength via controlnet_conditioning_scale.
 We strongly recommend using detailed prompts, especially when include text elements. For example, use "a poster with text 'InstantX Team' on the top" instead of "a poster".
 # Community Support
-[Liblib AI](https://www.liblib.art/) offers native support for Qwen-Image-ControlNet-Union. [Visit](https://www.liblib.art/) for more details.
 # Limitations
-We find that the model was unable to preserve some details, such as small font text.
 # Acknowledgements
 This model is developed by InstantX Team. All copyright reserved.

 ---
 # Qwen-Image-ControlNet-Union
+This repository provides a unified ControlNet that supports 4 common control types (canny, soft edge, depth, pose) for [Qwen-Image](https://github.com/QwenLM/Qwen-Image).
 # Model Cards
 import torch
 from diffusers.utils import load_image
+# https://github.com/huggingface/diffusers/pull/12215
+# pip install git+https://github.com/huggingface/diffusers
+from diffusers import QwenImageControlNetPipeline, QwenImageControlNetModel
 base_model = "Qwen/Qwen-Image"
 controlnet_model = "InstantX/Qwen-Image-ControlNet-Union"
 controlnet = QwenImageControlNetModel.from_pretrained(controlnet_model, torch_dtype=torch.bfloat16)
 pipe = QwenImageControlNetPipeline.from_pretrained(
+    base_model, controlnet=controlnet, torch_dtype=torch.bfloat16
 )
 pipe.to("cuda")
 prompt = "Aesthetics art, traditional asian pagoda, elaborate golden accents, sky blue and white color palette, swirling cloud pattern, digital illustration, east asian architecture, ornamental rooftop, intricate detailing on building, cultural representation."
 controlnet_conditioning_scale = 1.0
+# soft edge
 # control_image = load_image("conds/soft_edge.png")
 # prompt = "Photograph of a young man with light brown hair jumping mid-air off a large, reddish-brown rock. He's wearing a navy blue sweater, light blue shirt, gray pants, and brown shoes. His arms are outstretched, and he has a slight smile on his face. The background features a cloudy sky and a distant, leafless tree line. The grass around the rock is patchy."
+# controlnet_conditioning_scale = 1.0
 # depth
 # control_image = load_image("conds/depth.png")
 # prompt = "A swanky, minimalist living room with a huge floor-to-ceiling window letting in loads of natural light. A beige couch with white cushions sits on a wooden floor, with a matching coffee table in front. The walls are a soft, warm beige, decorated with two framed botanical prints. A potted plant chills in the corner near the window. Sunlight pours through the leaves outside, casting cool shadows on the floor."
+# controlnet_conditioning_scale = 1.0
 # pose
 # control_image = load_image("conds/pose.png")
 image.save(f"qwenimage_cn_union_result.png")
 ```
+# Inference Setting
 You can adjust control strength via controlnet_conditioning_scale.
 - Canny: use cv2.Canny, set controlnet_conditioning_scale in [0.8, 1.0]
 - Soft Edge: use [AnylineDetector](https://github.com/huggingface/controlnet_aux), set controlnet_conditioning_scale in [0.8, 1.0]
 We strongly recommend using detailed prompts, especially when include text elements. For example, use "a poster with text 'InstantX Team' on the top" instead of "a poster".
+For multiple conditions inference, please refer to [PR](https://github.com/huggingface/diffusers/pull/12215).
+# ComfyUI Support
+[ComfyUI](https://www.comfy.org/) offers native support for Qwen-Image-ControlNet-Union. [Visit](https://github.com/comfyanonymous/ComfyUI/pull/9488) for more details.
 # Community Support
+[Liblib AI](https://www.liblib.art/) offers native support for Qwen-Image-ControlNet-Union. [Visit](https://www.liblib.art/modelinfo/4d3f51c2bf1e4c51ae8dedd8c19da827?from=personal_page&versionUuid=5b5f21d2b80445598db19e924bd3a409) for more details.
 # Limitations
+We find that the model was unable to preserve some details without explicit 'TEXT' in prompt, such as small font text.
 # Acknowledgements
 This model is developed by InstantX Team. All copyright reserved.