Crosstyan commited on
Commit
ea4306f
·
1 Parent(s): 7395ca4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -1
README.md CHANGED
@@ -24,7 +24,21 @@ Here it is, the BPModel, a Stable Diffusion model you may love or hate.
24
  Trained with 5k high quality images that suit my taste (not necessary yours unfortunately) from [Sankaku Complex](https://chan.sankakucomplex.com) with annotations. Not the best strategy since pure combination of tags may not be the optimal way to describe the image, but I don't need to do extra work. And no, I won't feed any AI generated image
25
  to the model even it might outlaw the model from being used in some countries.
26
 
27
- The training of a high resolution model requires a significant amount of GPU hours and can be costly. In this particular case, 10 V100 GPU hours were spent on training a model with a resolution of 512, while 60 V100 GPU hours were spent on training a model with a resolution of 768. An additional 50 V100 GPU hours were also spent on training a model with a resolution of 1024, although only 10 epochs were run. The results of the training on the 1024 resolution model did not show a significant improvement compared to the 768 resolution model, and the resource demands, achieving a batch size of 1 on a V100 with 32G VRAM, were high. However, training on the 768 resolution did yield better results than training on the 512 resolution, and it is worth considering as an option. It is worth noting that Stable Diffusion 2.x also chose to train on a 768 resolution model. However, it may be more efficient to start with training on a 512 resolution model due to the slower training process and the need for additional prior knowledge to speed up the training process when working with a 768 resolution.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
  [Mikubill/naifu-diffusion](https://github.com/Mikubill/naifu-diffusion) is used as training script and I also recommend to
30
  checkout [CCRcmcpe/scal-sdt](https://github.com/CCRcmcpe/scal-sdt).
 
24
  Trained with 5k high quality images that suit my taste (not necessary yours unfortunately) from [Sankaku Complex](https://chan.sankakucomplex.com) with annotations. Not the best strategy since pure combination of tags may not be the optimal way to describe the image, but I don't need to do extra work. And no, I won't feed any AI generated image
25
  to the model even it might outlaw the model from being used in some countries.
26
 
27
+ The training of a high resolution model requires a significant amount of GPU
28
+ hours and can be costly. In this particular case, 10 V100 GPU hours were spent
29
+ on training 30 epochs with a resolution of 512, while 60 V100 GPU hours were spent
30
+ on training 30 epochs with a resolution of 768. An additional 100 V100 GPU hours
31
+ were also spent on training a model with a resolution of 1024, although **ONLY** 10
32
+ epochs were run. The results of the training on the 1024 resolution model did
33
+ not show a significant improvement compared to the 768 resolution model, and the
34
+ resource demands, achieving a batch size of 1 on a V100 with 32G VRAM, were
35
+ high. However, training on the 768 resolution did yield better results than
36
+ training on the 512 resolution, and it is worth considering as an option. It is
37
+ worth noting that Stable Diffusion 2.x also chose to train on a 768 resolution
38
+ model. However, it may be more efficient to start with training on a 512
39
+ resolution model due to the slower training process and the need for additional
40
+ prior knowledge to speed up the training process when working with a 768
41
+ resolution.
42
 
43
  [Mikubill/naifu-diffusion](https://github.com/Mikubill/naifu-diffusion) is used as training script and I also recommend to
44
  checkout [CCRcmcpe/scal-sdt](https://github.com/CCRcmcpe/scal-sdt).