update README for epoch 16 ckpt
Browse files
README.md
CHANGED
@@ -9,7 +9,7 @@ base_model:
|
|
9 |
A latent diffusion model (LDM) geared toward illustration, style composability, and sample variety. Addresses a few deficiencies with the SDXL base model.
|
10 |
|
11 |
* Architecture: SD XL (base model is v1.0)
|
12 |
-
* Training procedure: U-Net fully unfrozen, all-parameter continued pretraining at LR between 3e-8 and 3e-7 for
|
13 |
|
14 |
Trained on the Puzzle Box dataset, a large collection of permissively licensed images from the public Internet (or generated by previous Puzzle Box models). Each image
|
15 |
has from 3 to 17 different captions which are used interchangably during training. There are 9.3 million images and 62 million captions in the dataset.
|
@@ -23,7 +23,7 @@ booru-style, don't use underscores in your tags, replace those with spaces. Tags
|
|
23 |
Vitamin phrases: *top quartile*, *top decile* (there are also anti-vitamins, *bottom quartile* and *bottom decile*). These are the primary aesthetic labels (see below.)
|
24 |
|
25 |
Prompt adherence is unusually good; aesthetics are improved by human evaluation for generations between 1/4 and 1/2 megapixel in size for epochs 12-14, 1/4 to 2
|
26 |
-
megapixels for epoch 15
|
27 |
|
28 |
**Captioning:** About 1.4 million of the captions in the dataset are human-written. The remainder come from a variety of ML models, either vision transformers or
|
29 |
classifers. Models used in captioning the Puzzle Box dataset include: Qwen 2 VL 72b, BLIP 2 OPT-6.5B COCO, Llava 1.5, MiniCPM 2.6, bakllava, Moondream, DeepSeek Janus 7b,
|
@@ -46,6 +46,7 @@ This allows later checkpoints to generate 1+ megapixel images without tiling or
|
|
46 |
|
47 |
Model checkpoints currently available:
|
48 |
|
|
|
49 |
- from epoch 15, **15800k** training steps, 08 March 2025
|
50 |
- from epoch 14, **14290k** training steps, 02 December 2024
|
51 |
- from epoch 13, **11930k** training steps, 15 August 2024
|
|
|
9 |
A latent diffusion model (LDM) geared toward illustration, style composability, and sample variety. Addresses a few deficiencies with the SDXL base model.
|
10 |
|
11 |
* Architecture: SD XL (base model is v1.0)
|
12 |
+
* Training procedure: U-Net fully unfrozen, all-parameter continued pretraining at LR between 3e-8 and 3e-7 for 16,950,000 steps (at epoch 16, batch size 4).
|
13 |
|
14 |
Trained on the Puzzle Box dataset, a large collection of permissively licensed images from the public Internet (or generated by previous Puzzle Box models). Each image
|
15 |
has from 3 to 17 different captions which are used interchangably during training. There are 9.3 million images and 62 million captions in the dataset.
|
|
|
23 |
Vitamin phrases: *top quartile*, *top decile* (there are also anti-vitamins, *bottom quartile* and *bottom decile*). These are the primary aesthetic labels (see below.)
|
24 |
|
25 |
Prompt adherence is unusually good; aesthetics are improved by human evaluation for generations between 1/4 and 1/2 megapixel in size for epochs 12-14, 1/4 to 2
|
26 |
+
megapixels for epoch 15+. CFG scales between 2 and 7 can work well with Puzzle Box; experimenting with resolution or scale for your prompts is encouraged.
|
27 |
|
28 |
**Captioning:** About 1.4 million of the captions in the dataset are human-written. The remainder come from a variety of ML models, either vision transformers or
|
29 |
classifers. Models used in captioning the Puzzle Box dataset include: Qwen 2 VL 72b, BLIP 2 OPT-6.5B COCO, Llava 1.5, MiniCPM 2.6, bakllava, Moondream, DeepSeek Janus 7b,
|
|
|
46 |
|
47 |
Model checkpoints currently available:
|
48 |
|
49 |
+
- from epoch 16, **16950k** training steps, 05 May 2025
|
50 |
- from epoch 15, **15800k** training steps, 08 March 2025
|
51 |
- from epoch 14, **14290k** training steps, 02 December 2024
|
52 |
- from epoch 13, **11930k** training steps, 15 August 2024
|