dimitribarbot commited on
Commit
e677eaf
β€’
1 Parent(s): ec7f956

Add README

Browse files
README.md CHANGED
@@ -10,7 +10,164 @@ tags:
10
  - diffusers-training
11
  ---
12
 
13
- # SDXL-controlnet: DWPose
14
 
15
- These are controlnet weights trained on stabilityai/stable-diffusion-xl-base-1.0 with [DWPose](https://github.com/IDEA-Research/DWPose) conditioning.
16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  - diffusers-training
11
  ---
12
 
13
+ # SDXL ControlNet: DWPose
14
 
15
+ Here are controlnet weights trained on stabilityai/stable-diffusion-xl-base-1.0 with [DWPose](https://github.com/IDEA-Research/DWPose) conditioning.
16
 
17
+ ### Using in 🧨 diffusers
18
+
19
+ First, install all the libraries:
20
+
21
+ ```bash
22
+ pip install -q easy-dwpose transformers accelerate
23
+ pip install -q git+https://github.com/huggingface/diffusers
24
+ ```
25
+
26
+ #### Example 1
27
+
28
+ To generate a realistic DJ with the following pose:
29
+
30
+ ![Pose Image 1](./images/pose_image_1.png)
31
+
32
+ Run the following code:
33
+
34
+ ```python
35
+ from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline
36
+ import torch
37
+ from diffusers.utils import load_image
38
+
39
+ from easy_dwpose import DWposeDetector
40
+
41
+
42
+ pose_image = load_image("./pose_image_1.png")
43
+
44
+ # Load detector
45
+ device = "cuda:0" if torch.cuda.is_available() else "cpu"
46
+ dwpose = DWposeDetector(device=device)
47
+
48
+ # Compute DWpose conditioning image.
49
+ skeleton = dwpose(
50
+ pose_image,
51
+ detect_resolution=pose_image.width,
52
+ output_type="pil",
53
+ include_hands=True,
54
+ include_face=True,
55
+ )
56
+
57
+ # Initialize ControlNet pipeline.
58
+ controlnet = ControlNetModel.from_pretrained(
59
+ "dimitribarbot/controlnet-dwpose-sdxl-1.0",
60
+ torch_dtype=torch.float16,
61
+ )
62
+ pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
63
+ "stabilityai/stable-diffusion-xl-base-1.0",
64
+ controlnet=controlnet,
65
+ torch_dtype=torch.float16,
66
+ variant="fp16",
67
+ ).to(device)
68
+
69
+ # Infer.
70
+ prompt = "DJ in a party, shallow depth of field, highly detailed, high budget, gorgeous"
71
+ negative_prompt = "bad quality, blur, anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured"
72
+ image = pipe(
73
+ prompt,
74
+ negative_prompt=negative_prompt,
75
+ num_inference_steps=50,
76
+ guidance_scale=5,
77
+ image=skeleton,
78
+ generator=torch.manual_seed(97),
79
+ ).images[0]
80
+ ```
81
+
82
+ Generated pose is:
83
+
84
+ ![Pose 1](./images/dwpose_1.png)
85
+
86
+ Image generated by SDXL is:
87
+
88
+ ![Pose 1](./images/dwpose_image_1.png)
89
+
90
+ #### Example 2
91
+
92
+ To generate a anime version of a woman sitting on a bench with the following pose:
93
+
94
+ ![Pose Image 2](./images/pose_image_2.png)
95
+
96
+ Run the following code:
97
+
98
+ ```python
99
+ from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline
100
+ import torch
101
+ from diffusers.utils import load_image
102
+
103
+ from easy_dwpose import DWposeDetector
104
+
105
+
106
+ pose_image = load_image("./pose_image_2.png")
107
+
108
+ # Load detector
109
+ device = "cuda:0" if torch.cuda.is_available() else "cpu"
110
+ dwpose = DWposeDetector(device=device)
111
+
112
+ # Compute DWpose conditioning image.
113
+ skeleton = dwpose(
114
+ pose_image,
115
+ detect_resolution=pose_image.width,
116
+ output_type="pil",
117
+ include_hands=True,
118
+ include_face=True,
119
+ )
120
+
121
+ # Initialize ControlNet pipeline.
122
+ controlnet = ControlNetModel.from_pretrained(
123
+ "dimitribarbot/controlnet-dwpose-sdxl-1.0",
124
+ torch_dtype=torch.float16,
125
+ )
126
+ pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
127
+ "stabilityai/stable-diffusion-xl-base-1.0",
128
+ controlnet=controlnet,
129
+ torch_dtype=torch.float16,
130
+ variant="fp16",
131
+ )
132
+ if torch.cuda.is_available():
133
+ pipe.to(torch.device("cuda"))
134
+
135
+ # Infer.
136
+ prompt = "Anime girl sitting on a bench, highly detailed, noon, ambiant light"
137
+ negative_prompt = "bad quality, blur, anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured"
138
+ image = pipe(
139
+ prompt,
140
+ negative_prompt=negative_prompt,
141
+ num_inference_steps=25,
142
+ guidance_scale=18,
143
+ image=skeleton,
144
+ generator=torch.manual_seed(79),
145
+ ).images[0]
146
+ ```
147
+
148
+ Generated pose is:
149
+
150
+ ![Pose 2](./images/dwpose_2.png)
151
+
152
+ Image generated by SDXL is:
153
+
154
+ ![Pose 2](./images/dwpose_image_2.png)
155
+
156
+ ### Training
157
+
158
+ The [training script](https://github.com/huggingface/diffusers/blob/main/examples/controlnet/README_sdxl.md) by HFπŸ€— was used.
159
+
160
+ #### Training data
161
+ This checkpoint was trained for 15,000 steps on the [dimitribarbot/dw_pose_controlnet](https://huggingface.co/datasets/dimitribarbot/dw_pose_controlnet) dataset with a resolution of 1024.
162
+
163
+ #### Compute
164
+ One 1xA40 machine (during 48 hours)
165
+
166
+ #### Batch size
167
+ Data parallel with a single GPU batch size of 2 with gradient accumulation 8.
168
+
169
+ #### Hyper Parameters
170
+ Constant learning rate of 8e-5
171
+
172
+ #### Mixed precision
173
+ fp16
custom_dw_pose.png DELETED
Binary file (65.3 kB)
 
dwpose_1.png β†’ images/dwpose_1.png RENAMED
File without changes
dwpose_2.png β†’ images/dwpose_2.png RENAMED
File without changes
dwpose_image_1.png β†’ images/dwpose_image_1.png RENAMED
File without changes
dwpose_image_2.png β†’ images/dwpose_image_2.png RENAMED
File without changes
pose_image_1.png β†’ images/pose_image_1.png RENAMED
File without changes
pose_image_2.png β†’ images/pose_image_2.png RENAMED
File without changes
pose.png DELETED
Binary file (13.3 kB)