codehappy
/

puzzlebox-xl

Model card Files Files and versions

xet

Community

codehappy commited on May 19

Commit

da5dee0

verified ·

1 Parent(s): f80eb02

update README for epoch 16 ckpt

Browse files

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -9,7 +9,7 @@ base_model:
 A latent diffusion model (LDM) geared toward illustration, style composability, and sample variety. Addresses a few deficiencies with the SDXL base model.
 * Architecture: SD XL (base model is v1.0)
-* Training procedure: U-Net fully unfrozen, all-parameter continued pretraining at LR between 3e-8 and 3e-7 for 15,800,000 steps (at epoch 15, batch size 4).
 Trained on the Puzzle Box dataset, a large collection of permissively licensed images from the public Internet (or generated by previous Puzzle Box models). Each image
 has from 3 to 17 different captions which are used interchangably during training. There are 9.3 million images and 62 million captions in the dataset.
@@ -23,7 +23,7 @@ booru-style, don't use underscores in your tags, replace those with spaces. Tags
 Vitamin phrases: *top quartile*, *top decile* (there are also anti-vitamins, *bottom quartile* and *bottom decile*). These are the primary aesthetic labels (see below.)
 Prompt adherence is unusually good; aesthetics are improved by human evaluation for generations between 1/4 and 1/2 megapixel in size for epochs 12-14, 1/4 to 2
-megapixels for epoch 15. CFG scales between 2 and 7 can work well with Puzzle Box; experimenting with resolution or scale for your prompts is encouraged.
 **Captioning:** About 1.4 million of the captions in the dataset are human-written. The remainder come from a variety of ML models, either vision transformers or
 classifers. Models used in captioning the Puzzle Box dataset include: Qwen 2 VL 72b, BLIP 2 OPT-6.5B COCO, Llava 1.5, MiniCPM 2.6, bakllava, Moondream, DeepSeek Janus 7b,
@@ -46,6 +46,7 @@ This allows later checkpoints to generate 1+ megapixel images without tiling or
 Model checkpoints currently available:
 - from epoch 15, **15800k** training steps, 08 March 2025
 - from epoch 14, **14290k** training steps, 02 December 2024
 - from epoch 13, **11930k** training steps, 15 August 2024

 A latent diffusion model (LDM) geared toward illustration, style composability, and sample variety. Addresses a few deficiencies with the SDXL base model.
 * Architecture: SD XL (base model is v1.0)
+* Training procedure: U-Net fully unfrozen, all-parameter continued pretraining at LR between 3e-8 and 3e-7 for 16,950,000 steps (at epoch 16, batch size 4).
 Trained on the Puzzle Box dataset, a large collection of permissively licensed images from the public Internet (or generated by previous Puzzle Box models). Each image
 has from 3 to 17 different captions which are used interchangably during training. There are 9.3 million images and 62 million captions in the dataset.
 Vitamin phrases: *top quartile*, *top decile* (there are also anti-vitamins, *bottom quartile* and *bottom decile*). These are the primary aesthetic labels (see below.)
 Prompt adherence is unusually good; aesthetics are improved by human evaluation for generations between 1/4 and 1/2 megapixel in size for epochs 12-14, 1/4 to 2
+megapixels for epoch 15+. CFG scales between 2 and 7 can work well with Puzzle Box; experimenting with resolution or scale for your prompts is encouraged.
 **Captioning:** About 1.4 million of the captions in the dataset are human-written. The remainder come from a variety of ML models, either vision transformers or
 classifers. Models used in captioning the Puzzle Box dataset include: Qwen 2 VL 72b, BLIP 2 OPT-6.5B COCO, Llava 1.5, MiniCPM 2.6, bakllava, Moondream, DeepSeek Janus 7b,
 Model checkpoints currently available:
+- from epoch 16, **16950k** training steps, 05 May 2025
 - from epoch 15, **15800k** training steps, 08 March 2025
 - from epoch 14, **14290k** training steps, 02 December 2024
 - from epoch 13, **11930k** training steps, 15 August 2024