Commit
·
1495b68
1
Parent(s):
68ef273
Update README.md
Browse files
README.md
CHANGED
@@ -89,7 +89,7 @@ image.save("fantasy_forest_illustration.png")
|
|
89 |
- [Sygil Diffusion v0.2](https://huggingface.co/Sygil/Sygil-Diffusion/blob/main/sygil-diffusion-v0.2.ckpt): Resumed from Sygil Diffusion v0.1 and trained for a total of 1.77 million steps.
|
90 |
- [Sygil Diffusion v0.3](https://huggingface.co/Sygil/Sygil-Diffusion/blob/main/sygil-diffusion-v0.3.ckpt): Resumed from Sygil Diffusion v0.2 and trained for a total of 2.01 million steps so far.
|
91 |
- #### Beta:
|
92 |
-
- [sygil-diffusion-v0.
|
93 |
|
94 |
Note: Checkpoints under the Beta section are updated daily or at least 3-4 times a week. This is usually the equivalent of 1-2 training session,
|
95 |
this is done until they are stable enough to be moved into a proper release, usually every 1 or 2 weeks.
|
@@ -105,14 +105,14 @@ The model was trained on the following dataset:
|
|
105 |
|
106 |
**Hardware and others**
|
107 |
- **Hardware:** 1 x Nvidia RTX 3050 8GB GPU
|
108 |
-
- **Hours Trained:**
|
109 |
- **Optimizer:** AdamW
|
110 |
- **Adam Beta 1**: 0.9
|
111 |
- **Adam Beta 2**: 0.999
|
112 |
- **Adam Weight Decay**: 0.01
|
113 |
- **Adam Epsilon**: 1e-8
|
114 |
- **Gradient Checkpointing**: True
|
115 |
-
- **Gradient Accumulations**:
|
116 |
- **Batch:** 1
|
117 |
- **Learning Rate:** 1e-7
|
118 |
- **Learning Rate Scheduler:** cosine_with_restarts
|
@@ -120,7 +120,15 @@ The model was trained on the following dataset:
|
|
120 |
- **Lora unet Learning Rate**: 1e-7
|
121 |
- **Lora Text Encoder Learning Rate**: 1e-7
|
122 |
- **Resolution**: 512 pixels
|
123 |
-
- **Total Training Steps:** 2,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
124 |
|
125 |
Developed by: [ZeroCool94](https://github.com/ZeroCool940711) at [Sygil-Dev](https://github.com/Sygil-Dev/)
|
126 |
|
|
|
89 |
- [Sygil Diffusion v0.2](https://huggingface.co/Sygil/Sygil-Diffusion/blob/main/sygil-diffusion-v0.2.ckpt): Resumed from Sygil Diffusion v0.1 and trained for a total of 1.77 million steps.
|
90 |
- [Sygil Diffusion v0.3](https://huggingface.co/Sygil/Sygil-Diffusion/blob/main/sygil-diffusion-v0.3.ckpt): Resumed from Sygil Diffusion v0.2 and trained for a total of 2.01 million steps so far.
|
91 |
- #### Beta:
|
92 |
+
- [sygil-diffusion-v0.4_2318263_lora.ckptt](https://huggingface.co/Sygil/Sygil-Diffusion/blob/main/sygil-diffusion-v0.4_2318263_lora.ckpt): Resumed from Sygil Diffusion v0.3 and trained for a total of 2.31 million steps so far.
|
93 |
|
94 |
Note: Checkpoints under the Beta section are updated daily or at least 3-4 times a week. This is usually the equivalent of 1-2 training session,
|
95 |
this is done until they are stable enough to be moved into a proper release, usually every 1 or 2 weeks.
|
|
|
105 |
|
106 |
**Hardware and others**
|
107 |
- **Hardware:** 1 x Nvidia RTX 3050 8GB GPU
|
108 |
+
- **Hours Trained:** 840 hours approximately.
|
109 |
- **Optimizer:** AdamW
|
110 |
- **Adam Beta 1**: 0.9
|
111 |
- **Adam Beta 2**: 0.999
|
112 |
- **Adam Weight Decay**: 0.01
|
113 |
- **Adam Epsilon**: 1e-8
|
114 |
- **Gradient Checkpointing**: True
|
115 |
+
- **Gradient Accumulations**: 400
|
116 |
- **Batch:** 1
|
117 |
- **Learning Rate:** 1e-7
|
118 |
- **Learning Rate Scheduler:** cosine_with_restarts
|
|
|
120 |
- **Lora unet Learning Rate**: 1e-7
|
121 |
- **Lora Text Encoder Learning Rate**: 1e-7
|
122 |
- **Resolution**: 512 pixels
|
123 |
+
- **Total Training Steps:** 2,318,263
|
124 |
+
|
125 |
+
|
126 |
+
Note: For the learning rate I'm testing something new, after changing from using the `constant` scheduler to `cosine_with_restarts` after v0.3 was released, I noticed
|
127 |
+
it practically uses the optimal learning rate while trying to minimize the loss value, so, when every training session finishes I use for the next session the latest
|
128 |
+
learning rate value shown for the last few steps from the last session, this makes it so it will overtime decrease at a constant rate. When I add a lot of data to the training dataset
|
129 |
+
at once, I move the learning rate back to 1e-7 which then the scheduler will move down again as it learns more from the new data, this makes it so the training
|
130 |
+
doesn't overfit or uses a learning rate too low that makes the model not learn anything new for a while.
|
131 |
+
|
132 |
|
133 |
Developed by: [ZeroCool94](https://github.com/ZeroCool940711) at [Sygil-Dev](https://github.com/Sygil-Dev/)
|
134 |
|