File size: 11,511 Bytes
e8cdbdc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
df42d07
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6985143
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
Running 1 job

0it [00:00, ?it/s]
0it [00:00, ?it/s]
/usr/local/lib/python3.10/dist-packages/controlnet_aux/mediapipe_face/mediapipe_face_common.py:7: UserWarning: The module 'mediapipe' is not installed. The package will have limited functionality. Please install it using the command: pip install 'mediapipe'
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/controlnet_aux/segment_anything/modeling/tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_5m_224 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_5m_224. This is because the name being registered conflicts with an existing name. Please check if this is not expected.
  return register_model(fn_wrapper)
/usr/local/lib/python3.10/dist-packages/controlnet_aux/segment_anything/modeling/tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_11m_224 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_11m_224. This is because the name being registered conflicts with an existing name. Please check if this is not expected.
  return register_model(fn_wrapper)
/usr/local/lib/python3.10/dist-packages/controlnet_aux/segment_anything/modeling/tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_21m_224 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_21m_224. This is because the name being registered conflicts with an existing name. Please check if this is not expected.
  return register_model(fn_wrapper)
/usr/local/lib/python3.10/dist-packages/controlnet_aux/segment_anything/modeling/tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_21m_384 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_21m_384. This is because the name being registered conflicts with an existing name. Please check if this is not expected.
  return register_model(fn_wrapper)
/usr/local/lib/python3.10/dist-packages/controlnet_aux/segment_anything/modeling/tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_21m_512 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_21m_512. This is because the name being registered conflicts with an existing name. Please check if this is not expected.
  return register_model(fn_wrapper)
You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
{
    "type": "sd_trainer",
    "training_folder": "output",
    "device": "cuda:0",
    "network": {
        "type": "lora",
        "linear": 16,
        "linear_alpha": 16
    },
    "save": {
        "dtype": "float16",
        "save_every": 500,
        "max_step_saves_to_keep": 4,
        "push_to_hub": false
    },
    "datasets": [
        {
            "folder_path": "/workspace/ai-toolkit/images",
            "caption_ext": "txt",
            "caption_dropout_rate": 0.05,
            "shuffle_tokens": false,
            "cache_latents_to_disk": true,
            "resolution": [
                512,
                768,
                1024
            ]
        }
    ],
    "train": {
        "batch_size": 1,
        "steps": 3000,
        "gradient_accumulation_steps": 1,
        "train_unet": true,
        "train_text_encoder": false,
        "gradient_checkpointing": true,
        "noise_scheduler": "flowmatch",
        "optimizer": "adamw8bit",
        "lr": 0.0001,
        "ema_config": {
            "use_ema": true,
            "ema_decay": 0.99
        },
        "dtype": "bf16"
    },
    "model": {
        "name_or_path": "black-forest-labs/FLUX.1-dev",
        "is_flux": true,
        "quantize": true
    },
    "sample": {
        "sampler": "flowmatch",
        "sample_every": 500,
        "width": 1024,
        "height": 1024,
        "prompts": [
            "woman with red hair, playing chess at the park, bomb going off in the background",
            "a woman holding a coffee cup, in a beanie, sitting at a cafe",
            "a horse is a DJ at a night club, fish eye lens, smoke machine, lazer lights, holding a martini",
            "a man showing off his cool new t shirt at the beach, a shark is jumping out of the water in the background",
            "a bear building a log cabin in the snow covered mountains",
            "woman playing the guitar, on stage, singing a song, laser lights, punk rocker",
            "hipster man with a beard, building a chair, in a wood shop",
            "photo of a man, white background, medium shot, modeling clothing, studio lighting, white backdrop",
            "a man holding a sign that says, 'this is a sign'",
            "a bulldog, in a post apocalyptic world, with a shotgun, in a leather jacket, in a desert, with a motorcycle"
        ],
        "neg": "",
        "seed": 42,
        "walk_seed": true,
        "guidance_scale": 4,
        "sample_steps": 20
    },
    "trigger_word": "p3r5on"
}
Using EMA

#############################################
# Running job: my_first_flux_lora_v1
#############################################


Running  1 process
Loading Flux model
Loading transformer
Quantizing transformer
Loading vae
Loading t5

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]
Downloading shards:  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 1/2 [00:26<00:26, 26.58s/it]
Downloading shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:48<00:00, 23.78s/it]
Downloading shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:48<00:00, 24.20s/it]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards:  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 1/2 [00:00<00:00,  5.41it/s]
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:00<00:00,  6.00it/s]
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:00<00:00,  5.90it/s]
Quantizing T5
Loading clip
making pipe
preparing
create LoRA network. base dim (rank): 16, alpha: 16
neuron dropout: p=None, rank dropout: p=None, module dropout: p=None
create LoRA for Text Encoder: 0 modules.
create LoRA for U-Net: 494 modules.
enable LoRA for U-Net
Dataset: /workspace/ai-toolkit/images
  -  Preprocessing image dimensions

  0%|          | 0/11 [00:00<?, ?it/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 11/11 [00:00<00:00, 414.98it/s]
  -  Found 11 images
Bucket sizes for /workspace/ai-toolkit/images:
448x576: 11 files
1 buckets made
Caching latents for /workspace/ai-toolkit/images
 - Saving latents to disk

Caching latents to disk:   0%|          | 0/11 [00:00<?, ?it/s]
Caching latents to disk:   9%|β–‰         | 1/11 [00:00<00:03,  2.65it/s]
Caching latents to disk:  27%|β–ˆβ–ˆβ–‹       | 3/11 [00:00<00:01,  6.49it/s]
Caching latents to disk:  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 5/11 [00:00<00:00,  8.75it/s]
Caching latents to disk:  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 7/11 [00:00<00:00, 10.24it/s]
Caching latents to disk:  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 9/11 [00:00<00:00, 11.33it/s]
Caching latents to disk: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 11/11 [00:01<00:00, 12.11it/s]
Caching latents to disk: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 11/11 [00:01<00:00,  9.80it/s]
Dataset: /workspace/ai-toolkit/images
  -  Preprocessing image dimensions

  0%|          | 0/11 [00:00<?, ?it/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 11/11 [00:00<00:00, 42719.76it/s]
  -  Found 11 images
Bucket sizes for /workspace/ai-toolkit/images:
640x832: 11 files
1 buckets made
Caching latents for /workspace/ai-toolkit/images
 - Saving latents to disk

Caching latents to disk:   0%|          | 0/11 [00:00<?, ?it/s]
Caching latents to disk:   9%|β–‰         | 1/11 [00:00<00:01,  6.85it/s]
Caching latents to disk:  18%|β–ˆβ–Š        | 2/11 [00:00<00:01,  7.43it/s]
Caching latents to disk:  27%|β–ˆβ–ˆβ–‹       | 3/11 [00:00<00:00,  8.13it/s]
Caching latents to disk:  36%|β–ˆβ–ˆβ–ˆβ–‹      | 4/11 [00:00<00:00,  8.52it/s]
Caching latents to disk:  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 5/11 [00:00<00:00,  8.78it/s]
Caching latents to disk:  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 6/11 [00:00<00:00,  8.90it/s]
Caching latents to disk:  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 7/11 [00:00<00:00,  9.01it/s]
Caching latents to disk:  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 8/11 [00:00<00:00,  9.03it/s]
Caching latents to disk:  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 9/11 [00:01<00:00,  9.02it/s]
Caching latents to disk:  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 10/11 [00:01<00:00,  9.11it/s]
Caching latents to disk: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 11/11 [00:01<00:00,  8.66it/s]
Caching latents to disk: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 11/11 [00:01<00:00,  8.64it/s]
Dataset: /workspace/ai-toolkit/images
  -  Preprocessing image dimensions

  0%|          | 0/11 [00:00<?, ?it/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 11/11 [00:00<00:00, 35710.02it/s]
  -  Found 11 images
Bucket sizes for /workspace/ai-toolkit/images:
832x1152: 11 files
1 buckets made
Caching latents for /workspace/ai-toolkit/images
 - Saving latents to disk

Caching latents to disk:   0%|          | 0/11 [00:00<?, ?it/s]
Caching latents to disk:   9%|β–‰         | 1/11 [00:00<00:01,  5.11it/s]
Caching latents to disk:  18%|β–ˆβ–Š        | 2/11 [00:00<00:01,  5.29it/s]
Caching latents to disk:  27%|β–ˆβ–ˆβ–‹       | 3/11 [00:00<00:01,  5.41it/s]
Caching latents to disk:  36%|β–ˆβ–ˆβ–ˆβ–‹      | 4/11 [00:00<00:01,  5.48it/s]
Caching latents to disk:  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 5/11 [00:00<00:01,  5.29it/s]
Caching latents to disk:  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 6/11 [00:01<00:00,  5.42it/s]
Caching latents to disk:  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 7/11 [00:01<00:00,  5.53it/s]
Caching latents to disk:  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 8/11 [00:01<00:00,  5.57it/s]
Caching latents to disk:  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 9/11 [00:01<00:00,  5.52it/s]
Caching latents to disk:  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 10/11 [00:01<00:00,  5.51it/s]
Caching latents to disk: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 11/11 [00:02<00:00,  5.56it/s]
Caching latents to disk: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 11/11 [00:02<00:00,  5.48it/s]
Generating baseline samples before training

Generating Images:   0%|          | 0/10 [00:00<?, ?it/s]
Generating Images:  10%|β–ˆ         | 1/10 [01:26<12:56, 86.27s/it]
Generating Images:  20%|β–ˆβ–ˆ        | 2/10 [02:04<07:44, 58.03s/it]
Generating Images:  30%|β–ˆβ–ˆβ–ˆ       | 3/10 [02:43<05:44, 49.28s/it]
Generating Images:  40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 4/10 [03:22<04:31, 45.27s/it]
Generating Images:  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 5/10 [04:01<03:35, 43.08s/it]
Generating Images:  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 6/10 [04:40<02:46, 41.73s/it]
Generating Images:  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 7/10 [05:19<02:02, 40.85s/it]
Generating Images:  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 8/10 [05:58<01:20, 40.23s/it]
Generating Images:  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 9/10 [06:37<00:39, 39.82s/it]
Generating Images: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 10/10 [07:16<00:00, 39.53s/it]
                                                                  

my_first_flux_lora_v1:   0%|          | 0/3000 [00:00<?, ?it/s]
my_first_flux_lora_v1:   0%|          | 0/3000 [00:04<?, ?it/s, lr: 1.0e-04 loss: 4.015e-01]
my_first_flux_lora_v1:   0%|          | 0/3000 [00:04<?, ?it/s, lr: 1.0e-04 loss: 4.015e-01]
my_first_flux_lora_v1:   0%|          | 0/3000 [00:10<?, ?it/s, lr: 1.0e-04 loss: 4.988e-01]
my_first_flux_lora_v1:   0%|          | 1/3000 [00:10<4:56:11,  5.93s/it, lr: 1.0e-04 loss: 4.988e-01]