metadata

frameworks:
  - Pytorch
tasks:
  - text-to-image-synthesis
base_model:
  - Qwen/Qwen-Image
base_model_relation: adapter

Qwen-Image Image Structure Control Model - Depth ControlNet

Model Introduction

This model is a structure control model for images, trained based on Qwen-Image .The model architecture is ControlNet, which can control the generated image structure according to the depth (Depth) map .The training framework is built onDiffSynth-Studio and the dataset used is BLIP3o。

Effect Demonstration

Structure Map	Generated Image 1	Generated Image 2

Inference Code

git clone https://github.com/modelscope/DiffSynth-Studio.git  
cd DiffSynth-Studio
pip install -e .

from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig, ControlNetInput
from PIL import Image
import torch
from modelscope import dataset_snapshot_download


pipe = QwenImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
        ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"),
        ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"),
        ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
        ModelConfig(model_id="DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Depth", origin_file_pattern="model.safetensors"),
    ],
    tokenizer_config=ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="tokenizer/"),
)

dataset_snapshot_download(
    dataset_id="DiffSynth-Studio/example_image_dataset",
    local_dir="./data/example_image_dataset",
    allow_file_pattern="depth/image_1.jpg"
)

controlnet_image = Image.open("data/example_image_dataset/depth/image_1.jpg").resize((1328, 1328))

prompt = "Exquisite portrait of an underwater girl with flowing blue dress and fluttering hair. Transparent light and shadow, surrounded by bubbles. Her face is serene, with exquisite details and dreamy beauty."
image = pipe(
    prompt, seed=0,
    blockwise_controlnet_inputs=[ControlNetInput(image=controlnet_image)]
)
image.save("image.jpg")

SahilCarterr
/

Qwen-Image-Blockwise-ControlNet-Depth

Qwen-Image Image Structure Control Model - Depth ControlNet

Model Introduction

Effect Demonstration

Inference Code

license: apache-2.0