bananapuncakes commited on
Commit
288dee8
1 Parent(s): d4aff25

Trained for 3 epochs and 100 steps.

Browse files

Trained with datasets ['4o-training-embeds-thinned', '4o-training-images-thinned']
Learning rate 0.0001, batch size 2, and 40 gradient accumulation steps.
Used DDPM noise scheduler for training with epsilon prediction type and rescaled_betas_zero_snr=False
Using 'trailing' timestep spacing.
Base model: PixArt-alpha/PixArt-Sigma-XL-2-1024-MS
VAE: madebyollin/sdxl-vae-fp16-fix

README.md ADDED
@@ -0,0 +1,111 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: creativeml-openrail-m
3
+ base_model: "PixArt-alpha/PixArt-Sigma-XL-2-1024-MS"
4
+ tags:
5
+ - stable-diffusion
6
+ - stable-diffusion-diffusers
7
+ - text-to-image
8
+ - diffusers
9
+ - simpletuner
10
+ - full
11
+
12
+ inference: true
13
+ widget:
14
+ - text: 'unconditional (blank prompt)'
15
+ parameters:
16
+ negative_prompt: 'blurry, cropped, ugly'
17
+ output:
18
+ url: ./assets/image_0_0.png
19
+ - text: 'Digital art of a topless anthro male wolf wearing a sun hat and blue banana-patterned swimming trunks'
20
+ parameters:
21
+ negative_prompt: 'blurry, cropped, ugly'
22
+ output:
23
+ url: ./assets/image_1_0.png
24
+ ---
25
+
26
+ # pixart-sigma-test
27
+
28
+ This is a full rank finetune derived from [PixArt-alpha/PixArt-Sigma-XL-2-1024-MS](https://huggingface.co/PixArt-alpha/PixArt-Sigma-XL-2-1024-MS).
29
+
30
+
31
+
32
+ The main validation prompt used during training was:
33
+
34
+ ```
35
+ Digital art of a topless anthro male wolf wearing a sun hat and blue banana-patterned swimming trunks
36
+ ```
37
+
38
+ ## Validation settings
39
+ - CFG: `7.5`
40
+ - CFG Rescale: `0.0`
41
+ - Steps: `30`
42
+ - Sampler: `None`
43
+ - Seed: `42`
44
+ - Resolution: `1024`
45
+
46
+ Note: The validation settings are not necessarily the same as the [training settings](#training-settings).
47
+
48
+ You can find some example images in the following gallery:
49
+
50
+
51
+ <Gallery />
52
+
53
+ The text encoder **was not** trained.
54
+ You may reuse the base model text encoder for inference.
55
+
56
+
57
+ ## Training settings
58
+
59
+ - Training epochs: 3
60
+ - Training steps: 100
61
+ - Learning rate: 0.0001
62
+ - Effective batch size: 160
63
+ - Micro-batch size: 2
64
+ - Gradient accumulation steps: 40
65
+ - Number of GPUs: 2
66
+ - Prediction type: epsilon
67
+ - Rescaled betas zero SNR: False
68
+ - Optimizer: AdamW, stochastic bf16
69
+ - Precision: Pure BF16
70
+ - Xformers: Enabled
71
+
72
+
73
+ ## Datasets
74
+
75
+ ### 4o-training-images-thinned
76
+ - Repeats: 0
77
+ - Total number of images: ~4960
78
+ - Total number of aspect buckets: 1
79
+ - Resolution: 1.0 megapixels
80
+ - Cropped: True
81
+ - Crop style: center
82
+ - Crop aspect: square
83
+
84
+
85
+ ## Inference
86
+
87
+
88
+ ```python
89
+ import torch
90
+ from diffusers import DiffusionPipeline
91
+
92
+ model_id = 'pixart-sigma-test'
93
+ pipeline = DiffusionPipeline.from_pretrained(model_id)
94
+
95
+ prompt = "Digital art of a topless anthro male wolf wearing a sun hat and blue banana-patterned swimming trunks"
96
+ negative_prompt = "blurry, cropped, ugly"
97
+
98
+ pipeline.to('cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu')
99
+ image = pipeline(
100
+ prompt=prompt,
101
+ negative_prompt='blurry, cropped, ugly',
102
+ num_inference_steps=30,
103
+ generator=torch.Generator(device='cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu').manual_seed(1641421826),
104
+ width=1152,
105
+ height=768,
106
+ guidance_scale=7.5,
107
+ guidance_rescale=0.0,
108
+ ).images[0]
109
+ image.save("output.png", format="PNG")
110
+ ```
111
+
optimizer.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:91b34e3ecb83bd412a30a6ce6b4d6abc7d4d644672b50cecd71c2bf385255f01
3
+ size 3665677155
random_states_0.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:89eff11307725d964676a11b97b677c0871dcb58d799618b6e4b5d6bae53b92f
3
+ size 14604
scheduler.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ba77877730b79876fc3e1695338ca017ac9048f044e8311f771f5ace8bdd67fb
3
+ size 1000
training_state-4o-training-images-thinned.json ADDED
The diff for this file is too large to render. See raw diff
 
training_state.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"global_step": 100, "epoch_step": 100, "epoch": 4, "exhausted_backends": [], "repeats": {"4o-training-images-thinned": 0}}
transformer/config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "PixArtTransformer2DModel",
3
+ "_diffusers_version": "0.30.0.dev0",
4
+ "_name_or_path": "PixArt-alpha/PixArt-Sigma-XL-2-1024-MS",
5
+ "activation_fn": "gelu-approximate",
6
+ "attention_bias": true,
7
+ "attention_head_dim": 72,
8
+ "attention_type": "default",
9
+ "caption_channels": 4096,
10
+ "cross_attention_dim": 1152,
11
+ "double_self_attention": false,
12
+ "dropout": 0.0,
13
+ "in_channels": 4,
14
+ "interpolation_scale": 2,
15
+ "norm_elementwise_affine": false,
16
+ "norm_eps": 1e-06,
17
+ "norm_num_groups": 32,
18
+ "norm_type": "ada_norm_single",
19
+ "num_attention_heads": 16,
20
+ "num_embeds_ada_norm": 1000,
21
+ "num_layers": 28,
22
+ "num_vector_embeds": null,
23
+ "only_cross_attention": false,
24
+ "out_channels": 8,
25
+ "patch_size": 2,
26
+ "sample_size": 128,
27
+ "upcast_attention": false,
28
+ "use_additional_conditions": false,
29
+ "use_linear_projection": false
30
+ }
transformer/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7f3c20a20ad6785eb582affec1cec5f06bdb9cfc1bac236b06d1fc49b4ad8930
3
+ size 1221780352