alea31415 commited on
Commit
4997da9
·
1 Parent(s): 08f4713

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -10
README.md CHANGED
@@ -2,6 +2,8 @@
2
  license: creativeml-openrail-m
3
  ---
4
 
 
 
5
  ### Trigger words
6
 
7
  ```
@@ -14,23 +16,24 @@ For `0324_all_aniscreen_tags`, I accidentally tag all the character images with
14
  For `0325_aniscreen_fanart_styles`, things are done correctly (anime screenshots tagged as `aniscreen`, fanart tagged as `fanart`).
15
 
16
 
17
- ### Settings
18
 
19
  Default settings are
20
  - loha net dim 8, conv dim 4, alpha 1
21
- - lr 2e-4 constant scheduler throuout
22
  - Adam8bit
23
  - resolution 512
24
  - clip skip 1
25
 
26
  Names of the files suggest how the setting is changed with respect to this default setup.
27
- The configuration json files can otherwsie be found in the `config` subdirectories that lies in each folder.
28
  However, some experiments concern the effect of tags for which I regenerate the txt file and the difference can not be seen from the configuration file in this case.
29
  For now this concerns `05tag` for which tags are only used with probability 0.5.
30
 
31
  ### Some observations
32
 
33
- For a thorough comparaison please refer to the `generated_samples` folder.
 
34
 
35
  #### Captioning
36
 
@@ -44,13 +47,13 @@ Having all the tags (bottom three rows) remove the traits from subjects if these
44
 
45
  #### The effect of style images on characters
46
 
47
- I do beleive regularization images are important, far more important than tweaking any hyperparameters. They slow down training but also make sure that the undesired aspect are less baked into the model if we have images of other types, even if they are not for the subjects we train for.
48
 
49
  Comparing the models trained with and without style images, we can see that models trained with general style images have less anime styles baked in. The difference is particularly clear for Tilty, who only have anime screenshots for training.
50
 
51
  ![00103-20230327084923](https://huggingface.co/alea31415/LyCORIS-experiments/resolve/main/generated_samples/00103-20230327084923.png)
52
 
53
- On the other hand, the default clothes seem to be better trained when there is no regularization image. While this may seem beneficial, it is worth noticing that I keep all the output tags. Therefore, in a sense we only want to get the outputs when we prompt them explicitly. The magic of having the trigger words to fill in what is not in caption seems to be more pronouncing when we have regularization images. In any case, this magic will not work forever as we will eventually start overfitting. The following image show that we get images that are much closer after putting clothes in prompts.
54
 
55
  ![00105-20230327090703](https://huggingface.co/alea31415/LyCORIS-experiments/resolve/main/generated_samples/00105-20230327090703.png)
56
 
@@ -72,7 +75,7 @@ For example, if you want better background it can be simpler to switch the model
72
 
73
  This is one of the most debated topic in LoRa training.
74
  Both the original paper and the initial implementation of LoRa for SD suggest using quite small ranks.
75
- However, the 128 dim/alpha became the unfortunate default in many implementations for some times, which resulted in files with more than 100mb.
76
  Every since LoCon got introduced, we advocate again the use of smaller dimension and default the value of alpha to 1.
77
 
78
  As for LoHa, I have been insisting that the values that I am using here (net dim 8, conv dim 4, alpha 1) should be more than enough in most cases.
@@ -123,8 +126,16 @@ Since the outputs of Dadaptation seems to change more over time, I guess it may
123
 
124
  It is often suggested to set the text encoder learning rate to be smaller than that of unet.
125
  This of course causes training to be slower white it is hard to evaluate the benefit.
126
- In one experiment I half the text encoder learning rate and train the model two times longer.
127
- After spending some time here are two situations that reveal the potential benefit of this practice.
 
 
 
 
 
 
 
 
128
 
129
  - In my training set I have anime screenshots, tagged with `aniscreen` and fanarts, taggedd with `fanart`.
130
  Although they are balanced to have the same weight, the consistency of anime screenshots seems to drive the characters toward this style by default.
@@ -139,7 +150,7 @@ This aspect is difficult to test, but it seems to be confirmed by this "umbrella
139
 
140
  ![00083-20230327015201](https://huggingface.co/alea31415/LyCORIS-experiments/resolve/main/generated_samples/00083-20230327015201.png)
141
 
142
- There may be some disadvantages as well but his needs to be further explored.
143
  In any case, I still believe if we want to get the best result we should avoid compeletely text encoder training and do [pivotal tuning](https://github.com/cloneofsimo/lora/discussions/121) instead.
144
 
145
 
 
2
  license: creativeml-openrail-m
3
  ---
4
 
5
+ **General advice: Having a good dataset is more important than anything else**
6
+
7
  ### Trigger words
8
 
9
  ```
 
16
  For `0325_aniscreen_fanart_styles`, things are done correctly (anime screenshots tagged as `aniscreen`, fanart tagged as `fanart`).
17
 
18
 
19
+ ### Setting
20
 
21
  Default settings are
22
  - loha net dim 8, conv dim 4, alpha 1
23
+ - lr 2e-4 constant scheduler throughout
24
  - Adam8bit
25
  - resolution 512
26
  - clip skip 1
27
 
28
  Names of the files suggest how the setting is changed with respect to this default setup.
29
+ The configuration json files can otherwise be found in the `config` subdirectories that lies in each folder.
30
  However, some experiments concern the effect of tags for which I regenerate the txt file and the difference can not be seen from the configuration file in this case.
31
  For now this concerns `05tag` for which tags are only used with probability 0.5.
32
 
33
  ### Some observations
34
 
35
+ For a thorough comparison please refer to the `generated_samples` folder.
36
+
37
 
38
  #### Captioning
39
 
 
47
 
48
  #### The effect of style images on characters
49
 
50
+ I do believe regularization images are important, far more important than tweaking any hyperparameters. They slow down training but also make sure that the undesired aspect are less baked into the model if we have images of other types, even if they are not for the subjects we train for.
51
 
52
  Comparing the models trained with and without style images, we can see that models trained with general style images have less anime styles baked in. The difference is particularly clear for Tilty, who only have anime screenshots for training.
53
 
54
  ![00103-20230327084923](https://huggingface.co/alea31415/LyCORIS-experiments/resolve/main/generated_samples/00103-20230327084923.png)
55
 
56
+ On the other hand, the default clothes seem to be better trained when there are no regularization images. While this may seem beneficial, it is worth noticing that I keep all the output tags. Therefore, in a sense we only want to get a certain outfit when we prompt them explicitly. The magic of having the trigger words to fill in what is not in caption seems to be more pronouncing when we have regularization images. In any case, this magic will not work forever as we will eventually start overfitting. The following image show that we get images that are much closer after putting clothes in prompts.
57
 
58
  ![00105-20230327090703](https://huggingface.co/alea31415/LyCORIS-experiments/resolve/main/generated_samples/00105-20230327090703.png)
59
 
 
75
 
76
  This is one of the most debated topic in LoRa training.
77
  Both the original paper and the initial implementation of LoRa for SD suggest using quite small ranks.
78
+ However, the 128 dim/alpha became the unfortunate default in many implementations for some time, which resulted in files with more than 100mb.
79
  Every since LoCon got introduced, we advocate again the use of smaller dimension and default the value of alpha to 1.
80
 
81
  As for LoHa, I have been insisting that the values that I am using here (net dim 8, conv dim 4, alpha 1) should be more than enough in most cases.
 
126
 
127
  It is often suggested to set the text encoder learning rate to be smaller than that of unet.
128
  This of course causes training to be slower white it is hard to evaluate the benefit.
129
+ To begin, let me show how it actually slow downs the trainer process. In contrary to the common belief, it actually affects style training more than character training. I half the text encoder learning rate for the following experiments.
130
+
131
+ - This is what you get for characters. If the trigger words are put properly you barely see the difference, not mentioning the case of single character training that most people focus on. The interesting point however comes from the blending between Mahiro and Mihari due to sharing `Oyama` in the trigger words. Large text encoder learning rate help reduces the blending faster.
132
+ ![00106-20230327112316](https://huggingface.co/alea31415/LyCORIS-experiments/resolve/main/generated_samples/00106-20230327112316.png)
133
+
134
+ - For styles you can see training with lower text encoder rate actually makes training slower (the largest difference happens to ke-ta and momoko)
135
+ ![00107-20230327112855](https://huggingface.co/alea31415/LyCORIS-experiments/resolve/main/generated_samples/00107-20230327112855.png)
136
+ ![00017-20230325211523](https://huggingface.co/alea31415/LyCORIS-experiments/resolve/main/generated_samples/00017-20230325211523.png)
137
+
138
+ In total I train the model two times longer. After spending some time here are two situations that reveal the potential benefit of having smaller text encoder learning rate.
139
 
140
  - In my training set I have anime screenshots, tagged with `aniscreen` and fanarts, taggedd with `fanart`.
141
  Although they are balanced to have the same weight, the consistency of anime screenshots seems to drive the characters toward this style by default.
 
150
 
151
  ![00083-20230327015201](https://huggingface.co/alea31415/LyCORIS-experiments/resolve/main/generated_samples/00083-20230327015201.png)
152
 
153
+ There may be some other disadvantages other than slower training but this needs to be further explored.
154
  In any case, I still believe if we want to get the best result we should avoid compeletely text encoder training and do [pivotal tuning](https://github.com/cloneofsimo/lora/discussions/121) instead.
155
 
156