File size: 9,508 Bytes
fcb72eb a141c2f fcb72eb 46b6a5b fcb72eb 0bb179b 59da8a0 d3bbaf6 46b6a5b 1406363 a141c2f 1406363 d3bbaf6 46b6a5b a141c2f 46b6a5b d3bbaf6 0b3513f 46b6a5b 4918ff0 46b6a5b a141c2f db04c44 0bb179b a141c2f 0b3513f 46b6a5b 59da8a0 1099b93 59da8a0 a141c2f 61baa04 0bb179b d3bbaf6 129f9e8 d3bbaf6 a141c2f 46b6a5b a141c2f 9f23e65 61baa04 9f23e65 a141c2f 61baa04 a141c2f 46b6a5b 1406363 4918ff0 a141c2f 4918ff0 a141c2f 46b6a5b a141c2f 59da8a0 a141c2f 59da8a0 a141c2f 59da8a0 61baa04 59da8a0 0bb179b d3bbaf6 589d18e d3bbaf6 45cc134 d3bbaf6 45cc134 d3bbaf6 45cc134 d3bbaf6 45cc134 d3bbaf6 45cc134 d3bbaf6 45cc134 d3bbaf6 46b6a5b 45cc134 6903d2c 45cc134 1406363 6903d2c 45cc134 a141c2f 46b6a5b 45cc134 d3bbaf6 a141c2f d3bbaf6 a141c2f 0b3513f 9f23e65 0b3513f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 |
---
license: cc-by-nc-sa-4.0
datasets:
- ChristophSchuhmann/improved_aesthetics_6.5plus
language:
- en
tags:
- controlnet
---
Controls image generation by edge maps generated with [Edge Drawing](https://github.com/CihanTopal/ED_Lib). Note that Edge Drawing comes in different flavors: original (_ed_), parameter free (_edpf_), color (_edcolor_).
* Based on my monologs at [github.com - Edge Drawing](https://github.com/lllyasviel/ControlNet/discussions/318) with detailed report and evaluations.
* For usage see the model page on [civitai.com - Model](https://civitai.com/models/149740).
* To generate edpf maps you can use [this space](https://huggingface.co/spaces/GeroldMeisinger/edpf) or [this script at gitlab.com](https://gitlab.com/-/snippets/3601881).
* For evaluation images see the corresponding .zip's at "files".
* To run your own evaluations you can use [this script at gitlab.com](https://gitlab.com/-/snippets/3602096).
**Edge Drawing Parameter Free**

_Clear and pristine! Wooow!_
**Example**
sampler=UniPC steps=20 cfg=7.5 seed=0 batch=9 model: v1-5-pruned-emaonly.safetensors cherry-picked: 1/9
prompt: _a detailed high-quality professional photo of swedish woman standing in front of a mirror, dark brown hair, white hat with purple feather_

**Canndy Edge for comparison (default in Automatic1111)**

_Noise, artifacts and missing edges. Yuck! Ugh!_
# Image dataset
* [laion2B-en aesthetics>=6.5 dataset](https://huggingface.co/datasets/ChristophSchuhmann/improved_aesthetics_6.5plus)
* `--min_image_size 512 --max_aspect_ratio 2 --resize_mode="center_crop" --image_size 512`
* resulting in 180k images
# Training
```
accelerate launch train_controlnet.py ^
--pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" ^
--output_dir="control-edgedrawing-[version]-fp16/" ^
--dataset_name="mydataset" ^
--mixed_precision="fp16" ^
--resolution=512 ^
--learning_rate=1e-5 ^
--train_batch_size=1 ^
--gradient_accumulation_steps=4 ^
--gradient_checkpointing ^
--use_8bit_adam ^
--enable_xformers_memory_efficient_attention ^
--set_grads_to_none ^
--seed=0
```
# Evaluation
To evaluate the model it makes sense to compare it with the original Canny model. Original evaluations and comparisons are available at [ControlNet 1.0 repo](https://github.com/lllyasviel/ControlNet), [ControlNet 1.1 repo](https://github.com/lllyasviel/ControlNet-v1-1-nightly), [ControlNet paper v1](https://arxiv.org/abs/2302.05543v1), [ControlNet paper v2](https://arxiv.org/abs/2302.05543) and [Diffusers implementation](https://huggingface.co/takuma104/controlnet_dev/tree/main). Some points we have to keep in mind when comparing canny with edpf in order not to compare apples with oranges:
* canny 1.0 model was trained on 3M images with fp32, canny 1.1 model on even more, while edpf model so far is only trained on a 180k-360k with fp16.
* canny edge-detector requires parameter tuning while edpf is parameter free.
* Should we manually fine-tune canny to find the perfect input image or do we leave it at default? We could argue that "no fine-tuning required" is the usp of edpf and we want to compare in the default setting, whereas canny fine-tuning is subjective.
* Would the canny model actually benefit from a edpf pre-processor and we might not even require a specialized edpf model? (2023-09-25: see `eval_canny_edpf.zip` but it seems as if it doesn't work and the edpf model may be justified)
* When evaluating human images we need to be aware of Stable Diffusion's inherent limits, like disformed faces and hands, and don't attribute them to the control net.
* When evaluating style we need to be aware of the bias from the image dataset (`laion2b-en-aesthetics65`), which might tend to generating "aesthetic" images, and not actually work "intrisicly better".
# Versions
**Experiment 1 - 2023-09-19 - control-edgedrawing-default-drop50-fp16-checkpoint-40000**
Images converted with https://github.com/shaojunluo/EDLinePython (based on original (non-parameter free) edge drawing). Default settings are:
`smoothed=False`
```
{ 'ksize' : 5
, 'sigma' : 1.0
, 'gradientThreshold': 36
, 'anchorThreshold' : 8
, 'scanIntervals' : 1
}
```
additional arguments: `--proportion_empty_prompts=0.5`.
Trained for 40000 steps with default settings => results are not good. empty prompts were probably too excessive. retry with no drops and different algorithm parameters.
Update 2023-09-22: bug in algorithm produces too sparse images on default, see https://github.com/shaojunluo/EDLinePython/issues/4
**Experiment 2 - 2023-09-20 - control-edgedrawing-default-noisy-drop0-fp16-checkpoint-40000**
Same as experiment 1 with `smoothed=True` and `--proportion_empty_prompts=0`.
Trained for 40000 steps with default settings => results are not good. conditioning images look too noisy. investigate algorithm.
**Experiment 3.0 - 2023-09-22 - control-edgedrawing-cv480edpf-drop0-fp16-checkpoint-45000**
Conditioning images generated with [edpf.py](https://gitlab.com/-/snippets/3601881) using [opencv-contrib-python::ximgproc::EdgeDrawing](https://docs.opencv.org/4.8.0/d1/d1c/classcv_1_1ximgproc_1_1EdgeDrawing.html).
```
ed = cv2.ximgproc.createEdgeDrawing()
params = cv2.ximgproc.EdgeDrawing.Params()
params.PFmode = True
ed.setParams(params)
edges = ed.detectEdges(image)
edge_map = ed.getEdgeImage(edges)
```
45000 steps => looks good. released as **version 0.1 on civitai**.
resuming with left-right flipped images.
**Experiment 3.1 - 2023-09-24 - control-edgedrawing-cv480edpf-drop0-fp16-checkpoint-90000**
90000 steps (45000 steps on original, 45000 steps with left-right flipped images) => quality became better, might release as 0.2 on civitai.
**Experiment 3.2 - 2023-09-24 -control-edgedrawing-cv480edpf-drop0+50-fp16-checkpoint-118000**
resumed with epoch 2 from 90000 using `--proportion_empty_prompts=0.5` => results became worse, CN didn't pick up on no-prompts (I also tried intermediate checkpoint-104000). restarting with 50% drop.
**Experiment 4.0 - 2023-09-25 - control-edgedrawing-cv480edpf-drop50-fp16-checkpoint-45000**
see experiment 3.0. restarted from 0 with `--proportion_empty_prompts=0.5` => results are not good, 50% is probably too much for 45k steps. guessmode still doesn't work and tends to produces humans. resuming until 90k with right-left flipped in the hope it will get better with more images.
**Experiment 4.1 - 2023-09-26 - control-edgedrawing-cv480edpf-drop50-fp16-checkpoint-90000**
resumed from 45000 steps with left-right flipped images until 90000 steps => results are still not good, 50% is probably also too much for 90k steps. guessmode still doesn't work and tends to produces humans. aborting.
**Experiment 5.0 - 2023-09-28 - control-edgedrawing-cv480edpf-fastdup-fp16-checkpoint-45000**
see experiment 3. cleaned original images following the [fastdup introduction](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/cleaning-image-dataset.ipynb) resulting in:
```
180210 images in total
67854 duplicates
644 outliers
26 too dark
321 too bright
57 blurry
68621 unique removed (that's 38%!)
------
111589 unique images (x2 left-right flip)
```
restarted from 0 with left-right flipped images and `--mixed-precision="no"` to create a master release and convert to fp16 afterwards.
**Experiment 6.0 - 2023-10-02 - control-edgedrawing-cv480edpf-rect-fp16-checkpoint-45000|90000|135000**
see experiment 5.0.
* resized images with shortside to 512 which gives us rectangular images instead of 512x512 squares
* included images with aspect ratio > 2
* center-cropped images to 512x(n)*64 | n=8..16 , which keeps them SD compatible
* sorted duplicates by `similarity` value from `laion2b-en-aesthetics65` to get the "best" `text` from all the duplicates according to laion
```
183410 images in total
75686 duplicates
381 outliers
50 too dark
436 too bright
31 blurry
76288 unique removed (that's 42%!)
------
107122 unique images (x2 left-right flip)
```
1 epoch = 107122 * 2 / 4 = 53561 steps per epoch
restarted from 0 and `--mixed-precision="fp16"`.
TODO: Why did I end up with less images after I added more images? fastdup suddenly finds even more duplicates. Is fastdup default threshold=0.9 too aggressive?
**Experiment 6.1 - control-edgedrawing-cv480edpf-rect-fp16-batch32-checkpoint-6696**
see experiment 6.0. restarted from 0 with `--train_batch_size=2 --gradient_accumulation_steps=16`. 1 epoch = 107122 * 2 / 32 = 6696 steps per epoch => released as **version 0.2 on civitai**.
**Experiment 6.2 - control-edgedrawing-cv480edpf-rect-fp16-batch32-drop50-checkpoint-6696**
see experiment 6.1. restarted from 0 with `--proportion_empty_prompts=0.5`.
# Ideas
* experiment with higher gradient accumulation steps
* make conceptual captions for laion
* integrate edcolor
* try to fine-tune from canny
* image dataset with better captions (cc3m)
* remove images by semantic (use only photos, paintings etc. for edge detection)
* re-train with fp32
# Question and answers
**Q: What's the point of another edge control net anyway?**
A: 🤷
|