codehappy commited on
Commit
ba6a38b
·
verified ·
1 Parent(s): 205e9a6

update model card for epoch 15 ckpt

Browse files
Files changed (1) hide show
  1. README.md +4 -3
README.md CHANGED
@@ -9,10 +9,10 @@ base_model:
9
  A latent diffusion model (LDM) geared toward illustration, style composability, and sample variety. Addresses a few deficiencies with the SDXL base model.
10
 
11
  * Architecture: SD XL (base model is v1.0)
12
- * Training procedure: U-Net fully unfrozen, all-parameter continued pretraining at LR between 3e-8 and 3e-7 for 14,290,000 steps (at epoch 14, batch size 4).
13
 
14
  Trained on the Puzzle Box dataset, a large collection of permissively licensed images from the public Internet (or generated by previous Puzzle Box models). Each image
15
- has from 3 to 15 different captions which are used interchangably during training. There are 8.2 million images and 54 million captions in the dataset.
16
 
17
  The model is substantially better than the base SDXL model at producing images that look like film photographs, any kind of cartoon art, or old artist styles. It's also
18
  heavily tuned toward personal aesthetic preference.
@@ -27,7 +27,7 @@ megapixels for epoch 15. CFG scales between 2 and 7 can work well with Puzzle Bo
27
 
28
  **Captioning:** About 1.4 million of the captions in the dataset are human-written. The remainder come from a variety of ML models, either vision transformers or
29
  classifers. Models used in captioning the Puzzle Box dataset include: Qwen 2 VL 72b, BLIP 2 OPT-6.5B COCO, Llava 1.5, MiniCPM 2.6, bakllava, Moondream, DeepSeek Janus 7b,
30
- Mistral Pixtral 12b, CapPa, and wd-eva02-large-tagger-v3. Only open-weights models were used.
31
 
32
  In addition to human/machine-generated main caption, there are a large number of additional human-provided tags referring to style ("pointillism", "caricature", "Winsor McKay"),
33
  genre ("pop art", "advertising", "pixel art"), source ("wikiart", "library of congress"), or image content ("fluid expression", "pin-up", "squash and stretch").
@@ -46,6 +46,7 @@ This allows later checkpoints to generate 1+ megapixel images without tiling or
46
 
47
  Model checkpoints currently available:
48
 
 
49
  - from epoch 14, **14290k** training steps, 02 December 2024
50
  - from epoch 13, **11930k** training steps, 15 August 2024
51
  - from epoch 12, **10570k** training steps, 21 June 2024
 
9
  A latent diffusion model (LDM) geared toward illustration, style composability, and sample variety. Addresses a few deficiencies with the SDXL base model.
10
 
11
  * Architecture: SD XL (base model is v1.0)
12
+ * Training procedure: U-Net fully unfrozen, all-parameter continued pretraining at LR between 3e-8 and 3e-7 for 15,800,000 steps (at epoch 15, batch size 4).
13
 
14
  Trained on the Puzzle Box dataset, a large collection of permissively licensed images from the public Internet (or generated by previous Puzzle Box models). Each image
15
+ has from 3 to 17 different captions which are used interchangably during training. There are 9.3 million images and 62 million captions in the dataset.
16
 
17
  The model is substantially better than the base SDXL model at producing images that look like film photographs, any kind of cartoon art, or old artist styles. It's also
18
  heavily tuned toward personal aesthetic preference.
 
27
 
28
  **Captioning:** About 1.4 million of the captions in the dataset are human-written. The remainder come from a variety of ML models, either vision transformers or
29
  classifers. Models used in captioning the Puzzle Box dataset include: Qwen 2 VL 72b, BLIP 2 OPT-6.5B COCO, Llava 1.5, MiniCPM 2.6, bakllava, Moondream, DeepSeek Janus 7b,
30
+ Mistral Pixtral 12b, CapPa, Gemma 3 27b, JoyCaption, and wd-eva02-large-tagger-v3. Only open-weights models were used.
31
 
32
  In addition to human/machine-generated main caption, there are a large number of additional human-provided tags referring to style ("pointillism", "caricature", "Winsor McKay"),
33
  genre ("pop art", "advertising", "pixel art"), source ("wikiart", "library of congress"), or image content ("fluid expression", "pin-up", "squash and stretch").
 
46
 
47
  Model checkpoints currently available:
48
 
49
+ - from epoch 15, **15800k** training steps, 08 March 2025
50
  - from epoch 14, **14290k** training steps, 02 December 2024
51
  - from epoch 13, **11930k** training steps, 15 August 2024
52
  - from epoch 12, **10570k** training steps, 21 June 2024