File size: 14,309 Bytes
9f7232a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
2024-02-13,00:02:16 | INFO | Running with a single process. Device cuda:0.
2024-02-13,00:02:16 | INFO | Loaded ViT-B-32 model config.
2024-02-13,00:02:18 | INFO | Loading pretrained ViT-B-32 weights (laion2b_s34b_b79k).
2024-02-13,00:02:18 | INFO | Model:
2024-02-13,00:02:18 | INFO | CLIP(
  (visual): VisionTransformer(
    (conv1): Conv2d(3, 768, kernel_size=(32, 32), stride=(32, 32), bias=False)
    (patch_dropout): Identity()
    (ln_pre): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    (transformer): Transformer(
      (resblocks): ModuleList(
        (0-11): 12 x ResidualAttentionBlock(
          (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (attn): MultiheadAttention(
            (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
          )
          (ls_1): Identity()
          (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (mlp): Sequential(
            (c_fc): Linear(in_features=768, out_features=3072, bias=True)
            (gelu): GELU(approximate='none')
            (c_proj): Linear(in_features=3072, out_features=768, bias=True)
          )
          (ls_2): Identity()
        )
      )
    )
    (ln_post): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (transformer): Transformer(
    (resblocks): ModuleList(
      (0-11): 12 x ResidualAttentionBlock(
        (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (attn): MultiheadAttention(
          (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True)
        )
        (ls_1): Identity()
        (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (mlp): Sequential(
          (c_fc): Linear(in_features=512, out_features=2048, bias=True)
          (gelu): GELU(approximate='none')
          (c_proj): Linear(in_features=2048, out_features=512, bias=True)
        )
        (ls_2): Identity()
      )
    )
  )
  (token_embedding): Embedding(49408, 512)
  (ln_final): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
)
2024-02-13,00:02:18 | INFO | Params:
2024-02-13,00:02:18 | INFO |   accum_freq: 1
2024-02-13,00:02:18 | INFO |   aug_cfg: {}
2024-02-13,00:02:18 | INFO |   batch_size: 512
2024-02-13,00:02:18 | INFO |   beta1: 0.9
2024-02-13,00:02:18 | INFO |   beta2: 0.98
2024-02-13,00:02:18 | INFO |   checkpoint_path: ./logs/2024_02_13-00_02_16-model_ViT-B-32-lr_5e-05-b_512-j_8-p_amp_bf16/checkpoints
2024-02-13,00:02:18 | INFO |   coca_caption_loss_weight: 2.0
2024-02-13,00:02:18 | INFO |   coca_contrastive_loss_weight: 1.0
2024-02-13,00:02:18 | INFO |   copy_codebase: False
2024-02-13,00:02:18 | INFO |   csv_caption_key: captions
2024-02-13,00:02:18 | INFO |   csv_img_key: images
2024-02-13,00:02:18 | INFO |   csv_separator: 	
2024-02-13,00:02:18 | INFO |   dataset_resampled: False
2024-02-13,00:02:18 | INFO |   dataset_type: auto
2024-02-13,00:02:18 | INFO |   ddp_static_graph: True
2024-02-13,00:02:18 | INFO |   debug: False
2024-02-13,00:02:18 | INFO |   delete_previous_checkpoint: False
2024-02-13,00:02:18 | INFO |   device: cuda:0
2024-02-13,00:02:18 | INFO |   dist_backend: nccl
2024-02-13,00:02:18 | INFO |   dist_url: env://
2024-02-13,00:02:18 | INFO |   distill: False
2024-02-13,00:02:18 | INFO |   distill_model: None
2024-02-13,00:02:18 | INFO |   distill_pretrained: None
2024-02-13,00:02:18 | INFO |   distributed: False
2024-02-13,00:02:18 | INFO |   epochs: 15
2024-02-13,00:02:18 | INFO |   epochs_cooldown: None
2024-02-13,00:02:18 | INFO |   eps: 1e-06
2024-02-13,00:02:18 | INFO |   force_custom_text: False
2024-02-13,00:02:18 | INFO |   force_image_size: None
2024-02-13,00:02:18 | INFO |   force_patch_dropout: None
2024-02-13,00:02:18 | INFO |   force_quick_gelu: False
2024-02-13,00:02:18 | INFO |   gather_with_grad: True
2024-02-13,00:02:18 | INFO |   grad_checkpointing: False
2024-02-13,00:02:18 | INFO |   grad_clip_norm: None
2024-02-13,00:02:18 | INFO |   horovod: False
2024-02-13,00:02:18 | INFO |   image_interpolation: None
2024-02-13,00:02:18 | INFO |   image_mean: None
2024-02-13,00:02:18 | INFO |   image_resize_mode: None
2024-02-13,00:02:18 | INFO |   image_std: None
2024-02-13,00:02:18 | INFO |   imagenet_v2: None
2024-02-13,00:02:18 | INFO |   imagenet_val: None
2024-02-13,00:02:18 | INFO |   local_loss: True
2024-02-13,00:02:18 | INFO |   local_rank: 0
2024-02-13,00:02:18 | INFO |   lock_image: False
2024-02-13,00:02:18 | INFO |   lock_image_freeze_bn_stats: False
2024-02-13,00:02:18 | INFO |   lock_image_unlocked_groups: 0
2024-02-13,00:02:18 | INFO |   lock_text: False
2024-02-13,00:02:18 | INFO |   lock_text_freeze_layer_norm: False
2024-02-13,00:02:18 | INFO |   lock_text_unlocked_layers: 0
2024-02-13,00:02:18 | INFO |   log_every_n_steps: 100
2024-02-13,00:02:18 | INFO |   log_level: 20
2024-02-13,00:02:18 | INFO |   log_local: False
2024-02-13,00:02:18 | INFO |   log_path: ./logs/2024_02_13-00_02_16-model_ViT-B-32-lr_5e-05-b_512-j_8-p_amp_bf16/out.log
2024-02-13,00:02:18 | INFO |   logs: ./logs/
2024-02-13,00:02:18 | INFO |   lr: 5e-05
2024-02-13,00:02:18 | INFO |   lr_cooldown_end: 0.0
2024-02-13,00:02:18 | INFO |   lr_cooldown_power: 1.0
2024-02-13,00:02:18 | INFO |   lr_scheduler: cosine
2024-02-13,00:02:18 | INFO |   model: ViT-B-32
2024-02-13,00:02:18 | INFO |   name: 2024_02_13-00_02_16-model_ViT-B-32-lr_5e-05-b_512-j_8-p_amp_bf16
2024-02-13,00:02:18 | INFO |   no_set_device_rank: False
2024-02-13,00:02:18 | INFO |   precision: amp_bf16
2024-02-13,00:02:18 | INFO |   pretrained: laion2b_s34b_b79k
2024-02-13,00:02:18 | INFO |   pretrained_image: False
2024-02-13,00:02:18 | INFO |   rank: 0
2024-02-13,00:02:18 | INFO |   remote_sync: None
2024-02-13,00:02:18 | INFO |   remote_sync_frequency: 300
2024-02-13,00:02:18 | INFO |   remote_sync_protocol: s3
2024-02-13,00:02:18 | INFO |   report_to: 
2024-02-13,00:02:18 | INFO |   resume: None
2024-02-13,00:02:18 | INFO |   save_frequency: 5
2024-02-13,00:02:18 | INFO |   save_most_recent: False
2024-02-13,00:02:18 | INFO |   seed: 0
2024-02-13,00:02:18 | INFO |   siglip: False
2024-02-13,00:02:18 | INFO |   skip_scheduler: False
2024-02-13,00:02:18 | INFO |   tensorboard: False
2024-02-13,00:02:18 | INFO |   tensorboard_path: 
2024-02-13,00:02:18 | INFO |   torchcompile: False
2024-02-13,00:02:18 | INFO |   torchscript: False
2024-02-13,00:02:18 | INFO |   trace: False
2024-02-13,00:02:18 | INFO |   train_data: ../../train_data_counting_neg_clip.csv
2024-02-13,00:02:18 | INFO |   train_data_upsampling_factors: None
2024-02-13,00:02:18 | INFO |   train_num_samples: None
2024-02-13,00:02:18 | INFO |   use_bn_sync: False
2024-02-13,00:02:18 | INFO |   use_bnb_linear: None
2024-02-13,00:02:18 | INFO |   val_data: None
2024-02-13,00:02:18 | INFO |   val_frequency: 5
2024-02-13,00:02:18 | INFO |   val_num_samples: None
2024-02-13,00:02:18 | INFO |   wandb: False
2024-02-13,00:02:18 | INFO |   wandb_notes: 
2024-02-13,00:02:18 | INFO |   wandb_project_name: open-clip
2024-02-13,00:02:18 | INFO |   warmup: 1024
2024-02-13,00:02:18 | INFO |   wd: 0.2
2024-02-13,00:02:18 | INFO |   workers: 8
2024-02-13,00:02:18 | INFO |   world_size: 1
2024-02-13,00:02:18 | INFO |   zeroshot_frequency: 5
2024-02-13,00:02:18 | INFO | Start epoch 0
2024-02-13,00:02:32 | INFO | Train Epoch: 0 [ 1024/10010 (5%)] Data (t): 7.751 Batch (t): 13.065, 39.1899/s, 39.1899/s/gpu LR: 0.000000 Logit Scale: 100.000 Contrastive_loss: 6.4139 (6.4139) Loss: 6.4139 (6.4139)
2024-02-13,00:02:42 | INFO | Train Epoch: 0 [19456/10010 (100%)] Data (t): 0.001 Batch (t): 0.583, 874.223/s, 874.223/s/gpu LR: 0.000001 Logit Scale: 99.999 Contrastive_loss: 5.2933 (5.8536) Loss: 5.2933 (5.8536)
2024-02-13,00:02:42 | INFO | Start epoch 1
2024-02-13,00:02:49 | INFO | Train Epoch: 1 [ 1024/10010 (5%)] Data (t): 6.269 Batch (t): 6.642, 77.0819/s, 77.0819/s/gpu LR: 0.000001 Logit Scale: 99.999 Contrastive_loss: 5.0906 (5.0906) Loss: 5.0906 (5.0906)
2024-02-13,00:03:00 | INFO | Train Epoch: 1 [19456/10010 (100%)] Data (t): 0.013 Batch (t): 0.588, 864.032/s, 864.032/s/gpu LR: 0.000002 Logit Scale: 99.997 Contrastive_loss: 4.3162 (4.7034) Loss: 4.3162 (4.7034)
2024-02-13,00:03:00 | INFO | Start epoch 2
2024-02-13,00:03:08 | INFO | Train Epoch: 2 [ 1024/10010 (5%)] Data (t): 7.151 Batch (t): 7.527, 68.0199/s, 68.0199/s/gpu LR: 0.000002 Logit Scale: 99.997 Contrastive_loss: 4.1381 (4.1381) Loss: 4.1381 (4.1381)
2024-02-13,00:03:19 | INFO | Train Epoch: 2 [19456/10010 (100%)] Data (t): 0.033 Batch (t): 0.595, 866.703/s, 866.703/s/gpu LR: 0.000003 Logit Scale: 99.996 Contrastive_loss: 3.7141 (3.9261) Loss: 3.7141 (3.9261)
2024-02-13,00:03:19 | INFO | Start epoch 3
2024-02-13,00:03:26 | INFO | Train Epoch: 3 [ 1024/10010 (5%)] Data (t): 6.530 Batch (t): 6.891, 74.2989/s, 74.2989/s/gpu LR: 0.000003 Logit Scale: 99.996 Contrastive_loss: 3.7603 (3.7603) Loss: 3.7603 (3.7603)
2024-02-13,00:03:37 | INFO | Train Epoch: 3 [19456/10010 (100%)] Data (t): 0.045 Batch (t): 0.608, 865.795/s, 865.795/s/gpu LR: 0.000004 Logit Scale: 99.996 Contrastive_loss: 3.2845 (3.5224) Loss: 3.2845 (3.5224)
2024-02-13,00:03:37 | INFO | Start epoch 4
2024-02-13,00:03:44 | INFO | Train Epoch: 4 [ 1024/10010 (5%)] Data (t): 5.669 Batch (t): 6.038, 84.7949/s, 84.7949/s/gpu LR: 0.000004 Logit Scale: 99.996 Contrastive_loss: 3.1494 (3.1494) Loss: 3.1494 (3.1494)
2024-02-13,00:03:55 | INFO | Train Epoch: 4 [19456/10010 (100%)] Data (t): 0.026 Batch (t): 0.605, 864.414/s, 864.414/s/gpu LR: 0.000005 Logit Scale: 99.996 Contrastive_loss: 2.7506 (2.9500) Loss: 2.7506 (2.9500)
2024-02-13,00:03:57 | INFO | Start epoch 5
2024-02-13,00:04:04 | INFO | Train Epoch: 5 [ 1024/10010 (5%)] Data (t): 6.324 Batch (t): 6.699, 76.4267/s, 76.4267/s/gpu LR: 0.000005 Logit Scale: 99.996 Contrastive_loss: 2.7057 (2.7057) Loss: 2.7057 (2.7057)
2024-02-13,00:04:15 | INFO | Train Epoch: 5 [19456/10010 (100%)] Data (t): 0.089 Batch (t): 0.639, 867.588/s, 867.588/s/gpu LR: 0.000006 Logit Scale: 99.996 Contrastive_loss: 2.4321 (2.5689) Loss: 2.4321 (2.5689)
2024-02-13,00:04:16 | INFO | Start epoch 6
2024-02-13,00:04:22 | INFO | Train Epoch: 6 [ 1024/10010 (5%)] Data (t): 6.129 Batch (t): 6.509, 78.6643/s, 78.6643/s/gpu LR: 0.000006 Logit Scale: 99.996 Contrastive_loss: 2.3407 (2.3407) Loss: 2.3407 (2.3407)
2024-02-13,00:04:33 | INFO | Train Epoch: 6 [19456/10010 (100%)] Data (t): 0.013 Batch (t): 0.585, 866.873/s, 866.873/s/gpu LR: 0.000006 Logit Scale: 99.997 Contrastive_loss: 2.2057 (2.2732) Loss: 2.2057 (2.2732)
2024-02-13,00:04:33 | INFO | Start epoch 7
2024-02-13,00:04:40 | INFO | Train Epoch: 7 [ 1024/10010 (5%)] Data (t): 6.374 Batch (t): 6.750, 75.8483/s, 75.8483/s/gpu LR: 0.000007 Logit Scale: 99.997 Contrastive_loss: 1.9728 (1.9728) Loss: 1.9728 (1.9728)
2024-02-13,00:04:51 | INFO | Train Epoch: 7 [19456/10010 (100%)] Data (t): 0.070 Batch (t): 0.629, 865.081/s, 865.081/s/gpu LR: 0.000007 Logit Scale: 99.998 Contrastive_loss: 1.8460 (1.9094) Loss: 1.8460 (1.9094)
2024-02-13,00:04:52 | INFO | Start epoch 8
2024-02-13,00:04:59 | INFO | Train Epoch: 8 [ 1024/10010 (5%)] Data (t): 6.348 Batch (t): 6.724, 76.1504/s, 76.1504/s/gpu LR: 0.000007 Logit Scale: 99.999 Contrastive_loss: 1.6491 (1.6491) Loss: 1.6491 (1.6491)
2024-02-13,00:05:09 | INFO | Train Epoch: 8 [19456/10010 (100%)] Data (t): 0.039 Batch (t): 0.591, 864.632/s, 864.632/s/gpu LR: 0.000008 Logit Scale: 100.000 Contrastive_loss: 1.5005 (1.5748) Loss: 1.5005 (1.5748)
2024-02-13,00:05:10 | INFO | Start epoch 9
2024-02-13,00:05:16 | INFO | Train Epoch: 9 [ 1024/10010 (5%)] Data (t): 5.740 Batch (t): 6.101, 83.9270/s, 83.9270/s/gpu LR: 0.000008 Logit Scale: 100.000 Contrastive_loss: 1.2527 (1.2527) Loss: 1.2527 (1.2527)
2024-02-13,00:05:27 | INFO | Train Epoch: 9 [19456/10010 (100%)] Data (t): 0.029 Batch (t): 0.592, 866.672/s, 866.672/s/gpu LR: 0.000009 Logit Scale: 100.000 Contrastive_loss: 1.1425 (1.1976) Loss: 1.1425 (1.1976)
2024-02-13,00:05:29 | INFO | Start epoch 10
2024-02-13,00:05:35 | INFO | Train Epoch: 10 [ 1024/10010 (5%)] Data (t): 5.614 Batch (t): 5.988, 85.5099/s, 85.5099/s/gpu LR: 0.000009 Logit Scale: 100.000 Contrastive_loss: 0.92603 (0.92603) Loss: 0.92603 (0.92603)
2024-02-13,00:05:46 | INFO | Train Epoch: 10 [19456/10010 (100%)] Data (t): 0.089 Batch (t): 0.636, 866.623/s, 866.623/s/gpu LR: 0.000010 Logit Scale: 100.000 Contrastive_loss: 1.0557 (0.99089) Loss: 1.0557 (0.99089)
2024-02-13,00:05:47 | INFO | Start epoch 11
2024-02-13,00:05:54 | INFO | Train Epoch: 11 [ 1024/10010 (5%)] Data (t): 6.593 Batch (t): 6.953, 73.6390/s, 73.6390/s/gpu LR: 0.000010 Logit Scale: 100.000 Contrastive_loss: 0.75542 (0.75542) Loss: 0.75542 (0.75542)
2024-02-13,00:06:05 | INFO | Train Epoch: 11 [19456/10010 (100%)] Data (t): 0.045 Batch (t): 0.595, 865.436/s, 865.436/s/gpu LR: 0.000011 Logit Scale: 100.000 Contrastive_loss: 0.74945 (0.75243) Loss: 0.74945 (0.75243)
2024-02-13,00:06:05 | INFO | Start epoch 12
2024-02-13,00:06:11 | INFO | Train Epoch: 12 [ 1024/10010 (5%)] Data (t): 5.893 Batch (t): 6.266, 81.7047/s, 81.7047/s/gpu LR: 0.000011 Logit Scale: 100.000 Contrastive_loss: 0.60686 (0.60686) Loss: 0.60686 (0.60686)
2024-02-13,00:06:22 | INFO | Train Epoch: 12 [19456/10010 (100%)] Data (t): 0.015 Batch (t): 0.587, 865.351/s, 865.351/s/gpu LR: 0.000012 Logit Scale: 100.000 Contrastive_loss: 0.62050 (0.61368) Loss: 0.62050 (0.61368)
2024-02-13,00:06:22 | INFO | Start epoch 13
2024-02-13,00:06:30 | INFO | Train Epoch: 13 [ 1024/10010 (5%)] Data (t): 6.973 Batch (t): 7.323, 69.9169/s, 69.9169/s/gpu LR: 0.000012 Logit Scale: 100.000 Contrastive_loss: 0.49629 (0.49629) Loss: 0.49629 (0.49629)
2024-02-13,00:06:41 | INFO | Train Epoch: 13 [19456/10010 (100%)] Data (t): 0.044 Batch (t): 0.595, 874.122/s, 874.122/s/gpu LR: 0.000013 Logit Scale: 100.000 Contrastive_loss: 0.53294 (0.51462) Loss: 0.53294 (0.51462)
2024-02-13,00:06:41 | INFO | Start epoch 14
2024-02-13,00:06:48 | INFO | Train Epoch: 14 [ 1024/10010 (5%)] Data (t): 6.511 Batch (t): 6.872, 74.5086/s, 74.5086/s/gpu LR: 0.000013 Logit Scale: 100.000 Contrastive_loss: 0.45596 (0.45596) Loss: 0.45596 (0.45596)
2024-02-13,00:06:59 | INFO | Train Epoch: 14 [19456/10010 (100%)] Data (t): 0.014 Batch (t): 0.587, 865.782/s, 865.782/s/gpu LR: 0.000014 Logit Scale: 100.000 Contrastive_loss: 0.41602 (0.43599) Loss: 0.41602 (0.43599)