Trained for 3 epochs and 100 steps.

Trained with datasets ['4o-training-embeds-thinned', '4o-training-images-thinned']
Learning rate 0.0001, batch size 2, and 40 gradient accumulation steps.
Used DDPM noise scheduler for training with epsilon prediction type and rescaled_betas_zero_snr=False
Using 'trailing' timestep spacing.
Base model: PixArt-alpha/PixArt-Sigma-XL-2-1024-MS
VAE: madebyollin/sdxl-vae-fp16-fix

Files changed (8) hide show

README.md +111 -0
optimizer.bin +3 -0
random_states_0.pkl +3 -0
scheduler.bin +3 -0
training_state-4o-training-images-thinned.json +0 -0
training_state.json +1 -0
transformer/config.json +30 -0
transformer/diffusion_pytorch_model.safetensors +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,111 @@

+---
+license: creativeml-openrail-m
+base_model: "PixArt-alpha/PixArt-Sigma-XL-2-1024-MS"
+tags:
+  - stable-diffusion
+  - stable-diffusion-diffusers
+  - text-to-image
+  - diffusers
+  - simpletuner
+  - full
+inference: true
+widget:
+- text: 'unconditional (blank prompt)'
+  parameters:
+    negative_prompt: 'blurry, cropped, ugly'
+  output:
+    url: ./assets/image_0_0.png
+- text: 'Digital art of a topless anthro male wolf wearing a sun hat and blue banana-patterned swimming trunks'
+  parameters:
+    negative_prompt: 'blurry, cropped, ugly'
+  output:
+    url: ./assets/image_1_0.png
+---
+# pixart-sigma-test
+This is a full rank finetune derived from [PixArt-alpha/PixArt-Sigma-XL-2-1024-MS](https://huggingface.co/PixArt-alpha/PixArt-Sigma-XL-2-1024-MS).
+The main validation prompt used during training was:
+```
+Digital art of a topless anthro male wolf wearing a sun hat and blue banana-patterned swimming trunks
+```
+## Validation settings
+- CFG: `7.5`
+- CFG Rescale: `0.0`
+- Steps: `30`
+- Sampler: `None`
+- Seed: `42`
+- Resolution: `1024`
+Note: The validation settings are not necessarily the same as the [training settings](#training-settings).
+You can find some example images in the following gallery:
+<Gallery />
+The text encoder **was not** trained.
+You may reuse the base model text encoder for inference.
+## Training settings
+- Training epochs: 3
+- Training steps: 100
+- Learning rate: 0.0001
+- Effective batch size: 160
+  - Micro-batch size: 2
+  - Gradient accumulation steps: 40
+  - Number of GPUs: 2
+- Prediction type: epsilon
+- Rescaled betas zero SNR: False
+- Optimizer: AdamW, stochastic bf16
+- Precision: Pure BF16
+- Xformers: Enabled
+## Datasets
+### 4o-training-images-thinned
+- Repeats: 0
+- Total number of images: ~4960
+- Total number of aspect buckets: 1
+- Resolution: 1.0 megapixels
+- Cropped: True
+- Crop style: center
+- Crop aspect: square
+## Inference
+```python
+import torch
+from diffusers import DiffusionPipeline
+model_id = 'pixart-sigma-test'
+pipeline = DiffusionPipeline.from_pretrained(model_id)
+prompt = "Digital art of a topless anthro male wolf wearing a sun hat and blue banana-patterned swimming trunks"
+negative_prompt = "blurry, cropped, ugly"
+pipeline.to('cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu')
+image = pipeline(
+    prompt=prompt,
+    negative_prompt='blurry, cropped, ugly',
+    num_inference_steps=30,
+    generator=torch.Generator(device='cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu').manual_seed(1641421826),
+    width=1152,
+    height=768,
+    guidance_scale=7.5,
+    guidance_rescale=0.0,
+).images[0]
+image.save("output.png", format="PNG")
+```

optimizer.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:91b34e3ecb83bd412a30a6ce6b4d6abc7d4d644672b50cecd71c2bf385255f01
+size 3665677155

random_states_0.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:89eff11307725d964676a11b97b677c0871dcb58d799618b6e4b5d6bae53b92f
+size 14604

scheduler.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ba77877730b79876fc3e1695338ca017ac9048f044e8311f771f5ace8bdd67fb
+size 1000

training_state-4o-training-images-thinned.json ADDED Viewed

The diff for this file is too large to render. See raw diff

training_state.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"global_step": 100, "epoch_step": 100, "epoch": 4, "exhausted_backends": [], "repeats": {"4o-training-images-thinned": 0}}

transformer/config.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "_class_name": "PixArtTransformer2DModel",
+  "_diffusers_version": "0.30.0.dev0",
+  "_name_or_path": "PixArt-alpha/PixArt-Sigma-XL-2-1024-MS",
+  "activation_fn": "gelu-approximate",
+  "attention_bias": true,
+  "attention_head_dim": 72,
+  "attention_type": "default",
+  "caption_channels": 4096,
+  "cross_attention_dim": 1152,
+  "double_self_attention": false,
+  "dropout": 0.0,
+  "in_channels": 4,
+  "interpolation_scale": 2,
+  "norm_elementwise_affine": false,
+  "norm_eps": 1e-06,
+  "norm_num_groups": 32,
+  "norm_type": "ada_norm_single",
+  "num_attention_heads": 16,
+  "num_embeds_ada_norm": 1000,
+  "num_layers": 28,
+  "num_vector_embeds": null,
+  "only_cross_attention": false,
+  "out_channels": 8,
+  "patch_size": 2,
+  "sample_size": 128,
+  "upcast_attention": false,
+  "use_additional_conditions": false,
+  "use_linear_projection": false
+}

transformer/diffusion_pytorch_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7f3c20a20ad6785eb582affec1cec5f06bdb9cfc1bac236b06d1fc49b4ad8930
+size 1221780352