Gerold Meisinger commited on
Commit
47a2196
·
1 Parent(s): 940a19a
Files changed (2) hide show
  1. README.md +33 -0
  2. eval.zip +3 -0
README.md CHANGED
@@ -1,3 +1,36 @@
1
  ---
2
  license: cc-by-nc-sa-4.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: cc-by-nc-sa-4.0
3
  ---
4
+
5
+ **Convert color images to grayscale**
6
+
7
+ See the corresponding discussion at https://github.com/lllyasviel/ControlNet/discussions/561 !
8
+
9
+ I have trained a ControlNet (214244a32 drop=0.5 mp=fp16 lr=1e-5) for 1.25 epochs by using a pointwise function to convert RGB to grayscale... which effectively makes it a pointless ControlNet 🤣
10
+
11
+ I wanted to see how fast it converges on a simple linear-transformation. To emphasize again: it doesn't colorize grayscale images, it desaturates color images... which you might as well do in an image editor. It's the most ineffective way to make grayscale images. But it lets us evaluate the model very easily and we can peer into the inner workings of ControlNet a bit. And it's also a good baseline for inpainting assuming 0% masking and tells us which artefacts to expect in the unmasked area. I chose drop=0.5 because I assumed the CN should pick up on "ignore the prompt"-task very fast, similar to the desaturation task, and it lets us compare the influence of prompts, and it keeps it comparable with inpainting. I don't think it would have converged faster without any prompts.
12
+
13
+ # Training
14
+
15
+ ```
16
+ accelerate launch train_controlnet.py \
17
+ --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
18
+ --train_batch_size=4 \
19
+ --gradient_accumulation_steps=8 \
20
+ --proportion_empty_prompts=0.5
21
+ --mixed_precision="fp16" \
22
+ --learning_rate=1e-5 \
23
+ --enable_xformers_memory_efficient_attention \
24
+ --use_8bit_adam \
25
+ --set_grads_to_none \
26
+ --seed=0
27
+ ```
28
+
29
+ # Image dataset
30
+
31
+ * laion2B-en aesthetics>=6.5 dataset
32
+ * --min_image_size 512 --max_aspect_ratio 2 --resize_mode="center_crop" --image_size 512
33
+ * Cleaned with `fastdup` default settings
34
+ * Data augmented with right-left flipped images
35
+ * Resulting in 214244 images
36
+ * Converted to grayscale with `cv2`
eval.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2d834d2d9ed03a15e9be690359b7d4c337c7ff18e46a55572d45793871902002
3
+ size 262594427