update model card for epoch 15 ckpt
Browse files
README.md
CHANGED
@@ -9,10 +9,10 @@ base_model:
|
|
9 |
A latent diffusion model (LDM) geared toward illustration, style composability, and sample variety. Addresses a few deficiencies with the SDXL base model.
|
10 |
|
11 |
* Architecture: SD XL (base model is v1.0)
|
12 |
-
* Training procedure: U-Net fully unfrozen, all-parameter continued pretraining at LR between 3e-8 and 3e-7 for
|
13 |
|
14 |
Trained on the Puzzle Box dataset, a large collection of permissively licensed images from the public Internet (or generated by previous Puzzle Box models). Each image
|
15 |
-
has from 3 to
|
16 |
|
17 |
The model is substantially better than the base SDXL model at producing images that look like film photographs, any kind of cartoon art, or old artist styles. It's also
|
18 |
heavily tuned toward personal aesthetic preference.
|
@@ -27,7 +27,7 @@ megapixels for epoch 15. CFG scales between 2 and 7 can work well with Puzzle Bo
|
|
27 |
|
28 |
**Captioning:** About 1.4 million of the captions in the dataset are human-written. The remainder come from a variety of ML models, either vision transformers or
|
29 |
classifers. Models used in captioning the Puzzle Box dataset include: Qwen 2 VL 72b, BLIP 2 OPT-6.5B COCO, Llava 1.5, MiniCPM 2.6, bakllava, Moondream, DeepSeek Janus 7b,
|
30 |
-
Mistral Pixtral 12b, CapPa, and wd-eva02-large-tagger-v3. Only open-weights models were used.
|
31 |
|
32 |
In addition to human/machine-generated main caption, there are a large number of additional human-provided tags referring to style ("pointillism", "caricature", "Winsor McKay"),
|
33 |
genre ("pop art", "advertising", "pixel art"), source ("wikiart", "library of congress"), or image content ("fluid expression", "pin-up", "squash and stretch").
|
@@ -46,6 +46,7 @@ This allows later checkpoints to generate 1+ megapixel images without tiling or
|
|
46 |
|
47 |
Model checkpoints currently available:
|
48 |
|
|
|
49 |
- from epoch 14, **14290k** training steps, 02 December 2024
|
50 |
- from epoch 13, **11930k** training steps, 15 August 2024
|
51 |
- from epoch 12, **10570k** training steps, 21 June 2024
|
|
|
9 |
A latent diffusion model (LDM) geared toward illustration, style composability, and sample variety. Addresses a few deficiencies with the SDXL base model.
|
10 |
|
11 |
* Architecture: SD XL (base model is v1.0)
|
12 |
+
* Training procedure: U-Net fully unfrozen, all-parameter continued pretraining at LR between 3e-8 and 3e-7 for 15,800,000 steps (at epoch 15, batch size 4).
|
13 |
|
14 |
Trained on the Puzzle Box dataset, a large collection of permissively licensed images from the public Internet (or generated by previous Puzzle Box models). Each image
|
15 |
+
has from 3 to 17 different captions which are used interchangably during training. There are 9.3 million images and 62 million captions in the dataset.
|
16 |
|
17 |
The model is substantially better than the base SDXL model at producing images that look like film photographs, any kind of cartoon art, or old artist styles. It's also
|
18 |
heavily tuned toward personal aesthetic preference.
|
|
|
27 |
|
28 |
**Captioning:** About 1.4 million of the captions in the dataset are human-written. The remainder come from a variety of ML models, either vision transformers or
|
29 |
classifers. Models used in captioning the Puzzle Box dataset include: Qwen 2 VL 72b, BLIP 2 OPT-6.5B COCO, Llava 1.5, MiniCPM 2.6, bakllava, Moondream, DeepSeek Janus 7b,
|
30 |
+
Mistral Pixtral 12b, CapPa, Gemma 3 27b, JoyCaption, and wd-eva02-large-tagger-v3. Only open-weights models were used.
|
31 |
|
32 |
In addition to human/machine-generated main caption, there are a large number of additional human-provided tags referring to style ("pointillism", "caricature", "Winsor McKay"),
|
33 |
genre ("pop art", "advertising", "pixel art"), source ("wikiart", "library of congress"), or image content ("fluid expression", "pin-up", "squash and stretch").
|
|
|
46 |
|
47 |
Model checkpoints currently available:
|
48 |
|
49 |
+
- from epoch 15, **15800k** training steps, 08 March 2025
|
50 |
- from epoch 14, **14290k** training steps, 02 December 2024
|
51 |
- from epoch 13, **11930k** training steps, 15 August 2024
|
52 |
- from epoch 12, **10570k** training steps, 21 June 2024
|