|
--- |
|
license: cc-by-nc-sa-4.0 |
|
datasets: |
|
- ChristophSchuhmann/improved_aesthetics_6.5plus |
|
language: |
|
- en |
|
tags: |
|
- controlnet |
|
--- |
|
|
|
Controls image generation by edge maps generated with [Edge Drawing](https://github.com/CihanTopal/ED_Lib). Note that Edge Drawing comes in different flavors: original (_ed_), parameter free (_edpf_), color (_edcolor_). |
|
|
|
* Based on my monologs at [github.com - Edge Drawing](https://github.com/lllyasviel/ControlNet/discussions/318) with detailed report and evaluations. |
|
* For usage see the model page on [civitai.com - Model](https://civitai.com/models/149740). |
|
* To generate edpf maps you can use [this space](https://huggingface.co/spaces/GeroldMeisinger/edpf) or [this script at gitlab.com](https://gitlab.com/-/snippets/3601881). |
|
* For evaluation images see the corresponding .zip's at "files". |
|
* To run your own evaluations you can use [this script at gitlab.com](https://gitlab.com/-/snippets/3602096). |
|
|
|
**Edge Drawing Parameter Free** |
|
|
|
data:image/s3,"s3://crabby-images/0b6e7/0b6e7285a2b03895fcbfacae4a5bfe49caa4c268" alt="image/png" |
|
|
|
_Clear and pristine! Wooow!_ |
|
|
|
**Example** |
|
|
|
sampler=UniPC steps=20 cfg=7.5 seed=0 batch=9 model: v1-5-pruned-emaonly.safetensors cherry-picked: 1/9 |
|
|
|
prompt: _a detailed high-quality professional photo of swedish woman standing in front of a mirror, dark brown hair, white hat with purple feather_ |
|
|
|
data:image/s3,"s3://crabby-images/905d2/905d200c7effb50bd055af429cd885f2cbde9d0f" alt="image/png" |
|
|
|
**Canndy Edge for comparison (default in Automatic1111)** |
|
|
|
data:image/s3,"s3://crabby-images/fa232/fa2321d5c443736618ae93b74d190cd7f49a0288" alt="image/png" |
|
|
|
_Noise, artifacts and missing edges. Yuck! Ugh!_ |
|
|
|
# Image dataset |
|
|
|
* [laion2B-en aesthetics>=6.5 dataset](https://huggingface.co/datasets/ChristophSchuhmann/improved_aesthetics_6.5plus) |
|
* `--min_image_size 512 --max_aspect_ratio 2 --resize_mode="center_crop" --image_size 512` |
|
* resulting in 180k images |
|
|
|
# Training |
|
|
|
``` |
|
accelerate launch train_controlnet.py ^ |
|
--pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" ^ |
|
--output_dir="control-edgedrawing-[version]-fp16/" ^ |
|
--dataset_name="mydataset" ^ |
|
--mixed_precision="fp16" ^ |
|
--resolution=512 ^ |
|
--learning_rate=1e-5 ^ |
|
--train_batch_size=1 ^ |
|
--gradient_accumulation_steps=4 ^ |
|
--gradient_checkpointing ^ |
|
--use_8bit_adam ^ |
|
--enable_xformers_memory_efficient_attention ^ |
|
--set_grads_to_none ^ |
|
--seed=0 |
|
``` |
|
|
|
# Evaluation |
|
|
|
To evaluate the model it makes sense to compare it with the original Canny model. Original evaluations and comparisons are available at [ControlNet 1.0 repo](https://github.com/lllyasviel/ControlNet), [ControlNet 1.1 repo](https://github.com/lllyasviel/ControlNet-v1-1-nightly), [ControlNet paper v1](https://arxiv.org/abs/2302.05543v1), [ControlNet paper v2](https://arxiv.org/abs/2302.05543) and [Diffusers implementation](https://huggingface.co/takuma104/controlnet_dev/tree/main). Some points we have to keep in mind when comparing canny with edpf in order not to compare apples with oranges: |
|
* canny 1.0 model was trained on 3M images with fp32, canny 1.1 model on even more, while edpf model so far is only trained on a 180k-360k with fp16. |
|
* canny edge-detector requires parameter tuning while edpf is parameter free. |
|
* Should we manually fine-tune canny to find the perfect input image or do we leave it at default? We could argue that "no fine-tuning required" is the usp of edpf and we want to compare in the default setting, whereas canny fine-tuning is subjective. |
|
* Would the canny model actually benefit from a edpf pre-processor and we might not even require a specialized edpf model? (2023-09-25: see `eval_canny_edpf.zip` but it seems as if it doesn't work and the edpf model may be justified) |
|
* When evaluating human images we need to be aware of Stable Diffusion's inherent limits, like disformed faces and hands, and don't attribute them to the control net. |
|
* When evaluating style we need to be aware of the bias from the image dataset (`laion2b-en-aesthetics65`), which might tend to generating "aesthetic" images, and not actually work "intrisicly better". |
|
|
|
# Versions |
|
|
|
**Experiment 1 - 2023-09-19 - control-edgedrawing-default-drop50-fp16-checkpoint-40000** |
|
|
|
Images converted with https://github.com/shaojunluo/EDLinePython (based on original (non-parameter free) edge drawing). Default settings are: |
|
|
|
`smoothed=False` |
|
|
|
``` |
|
{ 'ksize' : 5 |
|
, 'sigma' : 1.0 |
|
, 'gradientThreshold': 36 |
|
, 'anchorThreshold' : 8 |
|
, 'scanIntervals' : 1 |
|
} |
|
``` |
|
|
|
additional arguments: `--proportion_empty_prompts=0.5`. |
|
|
|
Trained for 40000 steps with default settings => results are not good. empty prompts were probably too excessive. retry with no drops and different algorithm parameters. |
|
|
|
Update 2023-09-22: bug in algorithm produces too sparse images on default, see https://github.com/shaojunluo/EDLinePython/issues/4 |
|
|
|
**Experiment 2 - 2023-09-20 - control-edgedrawing-default-noisy-drop0-fp16-checkpoint-40000** |
|
|
|
Same as experiment 1 with `smoothed=True` and `--proportion_empty_prompts=0`. |
|
|
|
Trained for 40000 steps with default settings => results are not good. conditioning images look too noisy. investigate algorithm. |
|
|
|
**Experiment 3.0 - 2023-09-22 - control-edgedrawing-cv480edpf-drop0-fp16-checkpoint-45000** |
|
|
|
Conditioning images generated with [edpf.py](https://gitlab.com/-/snippets/3601881) using [opencv-contrib-python::ximgproc::EdgeDrawing](https://docs.opencv.org/4.8.0/d1/d1c/classcv_1_1ximgproc_1_1EdgeDrawing.html). |
|
|
|
``` |
|
ed = cv2.ximgproc.createEdgeDrawing() |
|
params = cv2.ximgproc.EdgeDrawing.Params() |
|
params.PFmode = True |
|
ed.setParams(params) |
|
edges = ed.detectEdges(image) |
|
edge_map = ed.getEdgeImage(edges) |
|
``` |
|
|
|
45000 steps => looks good. released as **version 0.1 on civitai**. |
|
|
|
resuming with left-right flipped images. |
|
|
|
**Experiment 3.1 - 2023-09-24 - control-edgedrawing-cv480edpf-drop0-fp16-checkpoint-90000** |
|
|
|
90000 steps (45000 steps on original, 45000 steps with left-right flipped images) => quality became better, might release as 0.2 on civitai. |
|
|
|
**Experiment 3.2 - 2023-09-24 -control-edgedrawing-cv480edpf-drop0+50-fp16-checkpoint-118000** |
|
|
|
resumed with epoch 2 from 90000 using `--proportion_empty_prompts=0.5` => results became worse, CN didn't pick up on no-prompts (I also tried intermediate checkpoint-104000). restarting with 50% drop. |
|
|
|
**Experiment 4.0 - 2023-09-25 - control-edgedrawing-cv480edpf-drop50-fp16-checkpoint-45000** |
|
|
|
see experiment 3.0. restarted from 0 with `--proportion_empty_prompts=0.5` => results are not good, 50% is probably too much for 45k steps. guessmode still doesn't work and tends to produces humans. resuming until 90k with right-left flipped in the hope it will get better with more images. |
|
|
|
**Experiment 4.1 - 2023-09-26 - control-edgedrawing-cv480edpf-drop50-fp16-checkpoint-90000** |
|
|
|
resumed from 45000 steps with left-right flipped images until 90000 steps => results are still not good, 50% is probably also too much for 90k steps. guessmode still doesn't work and tends to produces humans. aborting. |
|
|
|
**Experiment 5.0 - 2023-09-28 - control-edgedrawing-cv480edpf-fastdup-fp16-checkpoint-45000** |
|
|
|
see experiment 3. cleaned original images following the [fastdup introduction](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/cleaning-image-dataset.ipynb) resulting in: |
|
``` |
|
180210 images in total |
|
67854 duplicates |
|
644 outliers |
|
26 too dark |
|
321 too bright |
|
57 blurry |
|
68621 unique removed (that's 38%!) |
|
------ |
|
111589 unique images (x2 left-right flip) |
|
``` |
|
|
|
restarted from 0 with left-right flipped images and `--mixed-precision="no"` to create a master release and convert to fp16 afterwards. |
|
|
|
**Experiment 6.0 - 2023-10-02 - control-edgedrawing-cv480edpf-rect-fp16-checkpoint-45000|90000|135000** |
|
|
|
see experiment 5.0. |
|
* resized images with shortside to 512 which gives us rectangular images instead of 512x512 squares |
|
* included images with aspect ratio > 2 |
|
* center-cropped images to 512x(n)*64 | n=8..16 , which keeps them SD compatible |
|
* sorted duplicates by `similarity` value from `laion2b-en-aesthetics65` to get the "best" `text` from all the duplicates according to laion |
|
|
|
``` |
|
183410 images in total |
|
75686 duplicates |
|
381 outliers |
|
50 too dark |
|
436 too bright |
|
31 blurry |
|
76288 unique removed (that's 42%!) |
|
------ |
|
107122 unique images (x2 left-right flip) |
|
``` |
|
|
|
1 epoch = 107122 * 2 / 4 = 53561 steps per epoch |
|
|
|
restarted from 0 and `--mixed-precision="fp16"`. |
|
|
|
TODO: Why did I end up with less images after I added more images? fastdup suddenly finds even more duplicates. Is fastdup default threshold=0.9 too aggressive? |
|
|
|
**Experiment 6.1 - control-edgedrawing-cv480edpf-rect-fp16-batch32-checkpoint-6696** |
|
|
|
see experiment 6.0. restarted from 0 with `--train_batch_size=2 --gradient_accumulation_steps=16`. 1 epoch = 107122 * 2 / 32 = 6696 steps per epoch => released as **version 0.2 on civitai**. |
|
|
|
**Experiment 6.2 - control-edgedrawing-cv480edpf-rect-fp16-batch32-drop50-checkpoint-6696** |
|
|
|
see experiment 6.1. restarted from 0 with `--proportion_empty_prompts=0.5`. |
|
|
|
# Ideas |
|
|
|
* experiment with higher gradient accumulation steps |
|
* make conceptual captions for laion |
|
* integrate edcolor |
|
* try to fine-tune from canny |
|
* image dataset with better captions (cc3m) |
|
* remove images by semantic (use only photos, paintings etc. for edge detection) |
|
* re-train with fp32 |
|
|
|
# Question and answers |
|
|
|
**Q: What's the point of another edge control net anyway?** |
|
|
|
A: 🤷 |
|
|