codehappy
/

puzzlebox-xl

Model card Files Files and versions Community

codehappy commited on 15 days ago

Commit

ba6a38b

verified ·

1 Parent(s): 205e9a6

update model card for epoch 15 ckpt

Browse files

Files changed (1) hide show

README.md +4 -3

README.md CHANGED Viewed

@@ -9,10 +9,10 @@ base_model:
 A latent diffusion model (LDM) geared toward illustration, style composability, and sample variety. Addresses a few deficiencies with the SDXL base model.
 * Architecture: SD XL (base model is v1.0)
-* Training procedure: U-Net fully unfrozen, all-parameter continued pretraining at LR between 3e-8 and 3e-7 for 14,290,000 steps (at epoch 14, batch size 4).
 Trained on the Puzzle Box dataset, a large collection of permissively licensed images from the public Internet (or generated by previous Puzzle Box models). Each image
-has from 3 to 15 different captions which are used interchangably during training. There are 8.2 million images and 54 million captions in the dataset.
 The model is substantially better than the base SDXL model at producing images that look like film photographs, any kind of cartoon art, or old artist styles. It's also
 heavily tuned toward personal aesthetic preference.
@@ -27,7 +27,7 @@ megapixels for epoch 15. CFG scales between 2 and 7 can work well with Puzzle Bo
 **Captioning:** About 1.4 million of the captions in the dataset are human-written. The remainder come from a variety of ML models, either vision transformers or
 classifers. Models used in captioning the Puzzle Box dataset include: Qwen 2 VL 72b, BLIP 2 OPT-6.5B COCO, Llava 1.5, MiniCPM 2.6, bakllava, Moondream, DeepSeek Janus 7b,
-Mistral Pixtral 12b, CapPa, and wd-eva02-large-tagger-v3. Only open-weights models were used.
 In addition to human/machine-generated main caption, there are a large number of additional human-provided tags referring to style ("pointillism", "caricature", "Winsor McKay"),
 genre ("pop art", "advertising", "pixel art"), source ("wikiart", "library of congress"), or image content ("fluid expression", "pin-up", "squash and stretch").
@@ -46,6 +46,7 @@ This allows later checkpoints to generate 1+ megapixel images without tiling or
 Model checkpoints currently available:
 - from epoch 14, **14290k** training steps, 02 December 2024
 - from epoch 13, **11930k** training steps, 15 August 2024
 - from epoch 12, **10570k** training steps, 21 June 2024

 A latent diffusion model (LDM) geared toward illustration, style composability, and sample variety. Addresses a few deficiencies with the SDXL base model.
 * Architecture: SD XL (base model is v1.0)
+* Training procedure: U-Net fully unfrozen, all-parameter continued pretraining at LR between 3e-8 and 3e-7 for 15,800,000 steps (at epoch 15, batch size 4).
 Trained on the Puzzle Box dataset, a large collection of permissively licensed images from the public Internet (or generated by previous Puzzle Box models). Each image
+has from 3 to 17 different captions which are used interchangably during training. There are 9.3 million images and 62 million captions in the dataset.
 The model is substantially better than the base SDXL model at producing images that look like film photographs, any kind of cartoon art, or old artist styles. It's also
 heavily tuned toward personal aesthetic preference.
 **Captioning:** About 1.4 million of the captions in the dataset are human-written. The remainder come from a variety of ML models, either vision transformers or
 classifers. Models used in captioning the Puzzle Box dataset include: Qwen 2 VL 72b, BLIP 2 OPT-6.5B COCO, Llava 1.5, MiniCPM 2.6, bakllava, Moondream, DeepSeek Janus 7b,
+Mistral Pixtral 12b, CapPa, Gemma 3 27b, JoyCaption, and wd-eva02-large-tagger-v3. Only open-weights models were used.
 In addition to human/machine-generated main caption, there are a large number of additional human-provided tags referring to style ("pointillism", "caricature", "Winsor McKay"),
 genre ("pop art", "advertising", "pixel art"), source ("wikiart", "library of congress"), or image content ("fluid expression", "pin-up", "squash and stretch").
 Model checkpoints currently available:
+- from epoch 15, **15800k** training steps, 08 March 2025
 - from epoch 14, **14290k** training steps, 02 December 2024
 - from epoch 13, **11930k** training steps, 15 August 2024
 - from epoch 12, **10570k** training steps, 21 June 2024