SahilCarterr commited on
Commit
0d77f8d
·
verified ·
1 Parent(s): c5c3c2b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +95 -3
README.md CHANGED
@@ -1,3 +1,95 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ frameworks:
3
+ - Pytorch
4
+ tasks:
5
+ - text-to-image-synthesis
6
+
7
+ #model-type:
8
+ ##如 gpt、phi、llama、chatglm、baichuan 等
9
+ #- gpt
10
+
11
+ #domain:
12
+ ##如 nlp、cv、audio、multi-modal
13
+ #- nlp
14
+
15
+ #language:
16
+ ##语言代码列表 https://help.aliyun.com/document_detail/215387.html?spm=a2c4g.11186623.0.0.9f8d7467kni6Aa
17
+ #- cn
18
+
19
+ #metrics:
20
+ ##如 CIDEr、Blue、ROUGE 等
21
+ #- CIDEr
22
+
23
+ #tags:
24
+ ##各种自定义,包括 pretrained、fine-tuned、instruction-tuned、RL-tuned 等训练方法和其他
25
+ #- pretrained
26
+
27
+ #tools:
28
+ ##如 vllm、fastchat、llamacpp、AdaSeq 等
29
+ #- vllm
30
+ base_model:
31
+ - Qwen/Qwen-Image
32
+ base_model_relation: adapter
33
+ ---
34
+ # Qwen-Image Image Structure Control Model - Depth ControlNet
35
+
36
+ ![](./assets/cover.png)
37
+
38
+ ## Model Introduction
39
+
40
+ This model is a structure control model for images, trained based on [Qwen-Image](https://www.modelscope.cn/models/Qwen/Qwen-Image) .The model architecture is ControlNet, which can control the generated image structure according to the depth (Depth) map .The training framework is built on[DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio) and the dataset used is [BLIP3o](https://modelscope.cn/datasets/BLIP3o/BLIP3o-60k)。
41
+
42
+
43
+ ## Effect Demonstration
44
+
45
+ |Structure Map|Generated Image 1|Generated Image 2|
46
+ |-|-|-|
47
+ |![](./assets/depth2.jpg)|![](./assets/image2_0.jpg)|![](./assets/image2_1.jpg)|
48
+ |![](./assets/depth3.jpg)|![](./assets/image3_0.jpg)|![](./assets/image3_1.jpg)|
49
+ |![](./assets/depth1.jpg)|![](./assets/image1_0.jpg)|![](./assets/image1_1.jpg)|
50
+
51
+ ## Inference Code
52
+ ```
53
+ git clone https://github.com/modelscope/DiffSynth-Studio.git
54
+ cd DiffSynth-Studio
55
+ pip install -e .
56
+ ```
57
+
58
+ ```python
59
+ from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig, ControlNetInput
60
+ from PIL import Image
61
+ import torch
62
+ from modelscope import dataset_snapshot_download
63
+
64
+
65
+ pipe = QwenImagePipeline.from_pretrained(
66
+ torch_dtype=torch.bfloat16,
67
+ device="cuda",
68
+ model_configs=[
69
+ ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"),
70
+ ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"),
71
+ ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
72
+ ModelConfig(model_id="DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Depth", origin_file_pattern="model.safetensors"),
73
+ ],
74
+ tokenizer_config=ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="tokenizer/"),
75
+ )
76
+
77
+ dataset_snapshot_download(
78
+ dataset_id="DiffSynth-Studio/example_image_dataset",
79
+ local_dir="./data/example_image_dataset",
80
+ allow_file_pattern="depth/image_1.jpg"
81
+ )
82
+
83
+ controlnet_image = Image.open("data/example_image_dataset/depth/image_1.jpg").resize((1328, 1328))
84
+
85
+ prompt = "Exquisite portrait of an underwater girl with flowing blue dress and fluttering hair. Transparent light and shadow, surrounded by bubbles. Her face is serene, with exquisite details and dreamy beauty."
86
+ image = pipe(
87
+ prompt, seed=0,
88
+ blockwise_controlnet_inputs=[ControlNetInput(image=controlnet_image)]
89
+ )
90
+ image.save("image.jpg")
91
+
92
+ ```
93
+ ---
94
+ license: apache-2.0
95
+ ---