File size: 14,309 Bytes
9f7232a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 |
2024-02-13,00:02:16 | INFO | Running with a single process. Device cuda:0.
2024-02-13,00:02:16 | INFO | Loaded ViT-B-32 model config.
2024-02-13,00:02:18 | INFO | Loading pretrained ViT-B-32 weights (laion2b_s34b_b79k).
2024-02-13,00:02:18 | INFO | Model:
2024-02-13,00:02:18 | INFO | CLIP(
(visual): VisionTransformer(
(conv1): Conv2d(3, 768, kernel_size=(32, 32), stride=(32, 32), bias=False)
(patch_dropout): Identity()
(ln_pre): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(transformer): Transformer(
(resblocks): ModuleList(
(0-11): 12 x ResidualAttentionBlock(
(ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
)
(ls_1): Identity()
(ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(mlp): Sequential(
(c_fc): Linear(in_features=768, out_features=3072, bias=True)
(gelu): GELU(approximate='none')
(c_proj): Linear(in_features=3072, out_features=768, bias=True)
)
(ls_2): Identity()
)
)
)
(ln_post): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
(transformer): Transformer(
(resblocks): ModuleList(
(0-11): 12 x ResidualAttentionBlock(
(ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
(attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True)
)
(ls_1): Identity()
(ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
(mlp): Sequential(
(c_fc): Linear(in_features=512, out_features=2048, bias=True)
(gelu): GELU(approximate='none')
(c_proj): Linear(in_features=2048, out_features=512, bias=True)
)
(ls_2): Identity()
)
)
)
(token_embedding): Embedding(49408, 512)
(ln_final): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
)
2024-02-13,00:02:18 | INFO | Params:
2024-02-13,00:02:18 | INFO | accum_freq: 1
2024-02-13,00:02:18 | INFO | aug_cfg: {}
2024-02-13,00:02:18 | INFO | batch_size: 512
2024-02-13,00:02:18 | INFO | beta1: 0.9
2024-02-13,00:02:18 | INFO | beta2: 0.98
2024-02-13,00:02:18 | INFO | checkpoint_path: ./logs/2024_02_13-00_02_16-model_ViT-B-32-lr_5e-05-b_512-j_8-p_amp_bf16/checkpoints
2024-02-13,00:02:18 | INFO | coca_caption_loss_weight: 2.0
2024-02-13,00:02:18 | INFO | coca_contrastive_loss_weight: 1.0
2024-02-13,00:02:18 | INFO | copy_codebase: False
2024-02-13,00:02:18 | INFO | csv_caption_key: captions
2024-02-13,00:02:18 | INFO | csv_img_key: images
2024-02-13,00:02:18 | INFO | csv_separator:
2024-02-13,00:02:18 | INFO | dataset_resampled: False
2024-02-13,00:02:18 | INFO | dataset_type: auto
2024-02-13,00:02:18 | INFO | ddp_static_graph: True
2024-02-13,00:02:18 | INFO | debug: False
2024-02-13,00:02:18 | INFO | delete_previous_checkpoint: False
2024-02-13,00:02:18 | INFO | device: cuda:0
2024-02-13,00:02:18 | INFO | dist_backend: nccl
2024-02-13,00:02:18 | INFO | dist_url: env://
2024-02-13,00:02:18 | INFO | distill: False
2024-02-13,00:02:18 | INFO | distill_model: None
2024-02-13,00:02:18 | INFO | distill_pretrained: None
2024-02-13,00:02:18 | INFO | distributed: False
2024-02-13,00:02:18 | INFO | epochs: 15
2024-02-13,00:02:18 | INFO | epochs_cooldown: None
2024-02-13,00:02:18 | INFO | eps: 1e-06
2024-02-13,00:02:18 | INFO | force_custom_text: False
2024-02-13,00:02:18 | INFO | force_image_size: None
2024-02-13,00:02:18 | INFO | force_patch_dropout: None
2024-02-13,00:02:18 | INFO | force_quick_gelu: False
2024-02-13,00:02:18 | INFO | gather_with_grad: True
2024-02-13,00:02:18 | INFO | grad_checkpointing: False
2024-02-13,00:02:18 | INFO | grad_clip_norm: None
2024-02-13,00:02:18 | INFO | horovod: False
2024-02-13,00:02:18 | INFO | image_interpolation: None
2024-02-13,00:02:18 | INFO | image_mean: None
2024-02-13,00:02:18 | INFO | image_resize_mode: None
2024-02-13,00:02:18 | INFO | image_std: None
2024-02-13,00:02:18 | INFO | imagenet_v2: None
2024-02-13,00:02:18 | INFO | imagenet_val: None
2024-02-13,00:02:18 | INFO | local_loss: True
2024-02-13,00:02:18 | INFO | local_rank: 0
2024-02-13,00:02:18 | INFO | lock_image: False
2024-02-13,00:02:18 | INFO | lock_image_freeze_bn_stats: False
2024-02-13,00:02:18 | INFO | lock_image_unlocked_groups: 0
2024-02-13,00:02:18 | INFO | lock_text: False
2024-02-13,00:02:18 | INFO | lock_text_freeze_layer_norm: False
2024-02-13,00:02:18 | INFO | lock_text_unlocked_layers: 0
2024-02-13,00:02:18 | INFO | log_every_n_steps: 100
2024-02-13,00:02:18 | INFO | log_level: 20
2024-02-13,00:02:18 | INFO | log_local: False
2024-02-13,00:02:18 | INFO | log_path: ./logs/2024_02_13-00_02_16-model_ViT-B-32-lr_5e-05-b_512-j_8-p_amp_bf16/out.log
2024-02-13,00:02:18 | INFO | logs: ./logs/
2024-02-13,00:02:18 | INFO | lr: 5e-05
2024-02-13,00:02:18 | INFO | lr_cooldown_end: 0.0
2024-02-13,00:02:18 | INFO | lr_cooldown_power: 1.0
2024-02-13,00:02:18 | INFO | lr_scheduler: cosine
2024-02-13,00:02:18 | INFO | model: ViT-B-32
2024-02-13,00:02:18 | INFO | name: 2024_02_13-00_02_16-model_ViT-B-32-lr_5e-05-b_512-j_8-p_amp_bf16
2024-02-13,00:02:18 | INFO | no_set_device_rank: False
2024-02-13,00:02:18 | INFO | precision: amp_bf16
2024-02-13,00:02:18 | INFO | pretrained: laion2b_s34b_b79k
2024-02-13,00:02:18 | INFO | pretrained_image: False
2024-02-13,00:02:18 | INFO | rank: 0
2024-02-13,00:02:18 | INFO | remote_sync: None
2024-02-13,00:02:18 | INFO | remote_sync_frequency: 300
2024-02-13,00:02:18 | INFO | remote_sync_protocol: s3
2024-02-13,00:02:18 | INFO | report_to:
2024-02-13,00:02:18 | INFO | resume: None
2024-02-13,00:02:18 | INFO | save_frequency: 5
2024-02-13,00:02:18 | INFO | save_most_recent: False
2024-02-13,00:02:18 | INFO | seed: 0
2024-02-13,00:02:18 | INFO | siglip: False
2024-02-13,00:02:18 | INFO | skip_scheduler: False
2024-02-13,00:02:18 | INFO | tensorboard: False
2024-02-13,00:02:18 | INFO | tensorboard_path:
2024-02-13,00:02:18 | INFO | torchcompile: False
2024-02-13,00:02:18 | INFO | torchscript: False
2024-02-13,00:02:18 | INFO | trace: False
2024-02-13,00:02:18 | INFO | train_data: ../../train_data_counting_neg_clip.csv
2024-02-13,00:02:18 | INFO | train_data_upsampling_factors: None
2024-02-13,00:02:18 | INFO | train_num_samples: None
2024-02-13,00:02:18 | INFO | use_bn_sync: False
2024-02-13,00:02:18 | INFO | use_bnb_linear: None
2024-02-13,00:02:18 | INFO | val_data: None
2024-02-13,00:02:18 | INFO | val_frequency: 5
2024-02-13,00:02:18 | INFO | val_num_samples: None
2024-02-13,00:02:18 | INFO | wandb: False
2024-02-13,00:02:18 | INFO | wandb_notes:
2024-02-13,00:02:18 | INFO | wandb_project_name: open-clip
2024-02-13,00:02:18 | INFO | warmup: 1024
2024-02-13,00:02:18 | INFO | wd: 0.2
2024-02-13,00:02:18 | INFO | workers: 8
2024-02-13,00:02:18 | INFO | world_size: 1
2024-02-13,00:02:18 | INFO | zeroshot_frequency: 5
2024-02-13,00:02:18 | INFO | Start epoch 0
2024-02-13,00:02:32 | INFO | Train Epoch: 0 [ 1024/10010 (5%)] Data (t): 7.751 Batch (t): 13.065, 39.1899/s, 39.1899/s/gpu LR: 0.000000 Logit Scale: 100.000 Contrastive_loss: 6.4139 (6.4139) Loss: 6.4139 (6.4139)
2024-02-13,00:02:42 | INFO | Train Epoch: 0 [19456/10010 (100%)] Data (t): 0.001 Batch (t): 0.583, 874.223/s, 874.223/s/gpu LR: 0.000001 Logit Scale: 99.999 Contrastive_loss: 5.2933 (5.8536) Loss: 5.2933 (5.8536)
2024-02-13,00:02:42 | INFO | Start epoch 1
2024-02-13,00:02:49 | INFO | Train Epoch: 1 [ 1024/10010 (5%)] Data (t): 6.269 Batch (t): 6.642, 77.0819/s, 77.0819/s/gpu LR: 0.000001 Logit Scale: 99.999 Contrastive_loss: 5.0906 (5.0906) Loss: 5.0906 (5.0906)
2024-02-13,00:03:00 | INFO | Train Epoch: 1 [19456/10010 (100%)] Data (t): 0.013 Batch (t): 0.588, 864.032/s, 864.032/s/gpu LR: 0.000002 Logit Scale: 99.997 Contrastive_loss: 4.3162 (4.7034) Loss: 4.3162 (4.7034)
2024-02-13,00:03:00 | INFO | Start epoch 2
2024-02-13,00:03:08 | INFO | Train Epoch: 2 [ 1024/10010 (5%)] Data (t): 7.151 Batch (t): 7.527, 68.0199/s, 68.0199/s/gpu LR: 0.000002 Logit Scale: 99.997 Contrastive_loss: 4.1381 (4.1381) Loss: 4.1381 (4.1381)
2024-02-13,00:03:19 | INFO | Train Epoch: 2 [19456/10010 (100%)] Data (t): 0.033 Batch (t): 0.595, 866.703/s, 866.703/s/gpu LR: 0.000003 Logit Scale: 99.996 Contrastive_loss: 3.7141 (3.9261) Loss: 3.7141 (3.9261)
2024-02-13,00:03:19 | INFO | Start epoch 3
2024-02-13,00:03:26 | INFO | Train Epoch: 3 [ 1024/10010 (5%)] Data (t): 6.530 Batch (t): 6.891, 74.2989/s, 74.2989/s/gpu LR: 0.000003 Logit Scale: 99.996 Contrastive_loss: 3.7603 (3.7603) Loss: 3.7603 (3.7603)
2024-02-13,00:03:37 | INFO | Train Epoch: 3 [19456/10010 (100%)] Data (t): 0.045 Batch (t): 0.608, 865.795/s, 865.795/s/gpu LR: 0.000004 Logit Scale: 99.996 Contrastive_loss: 3.2845 (3.5224) Loss: 3.2845 (3.5224)
2024-02-13,00:03:37 | INFO | Start epoch 4
2024-02-13,00:03:44 | INFO | Train Epoch: 4 [ 1024/10010 (5%)] Data (t): 5.669 Batch (t): 6.038, 84.7949/s, 84.7949/s/gpu LR: 0.000004 Logit Scale: 99.996 Contrastive_loss: 3.1494 (3.1494) Loss: 3.1494 (3.1494)
2024-02-13,00:03:55 | INFO | Train Epoch: 4 [19456/10010 (100%)] Data (t): 0.026 Batch (t): 0.605, 864.414/s, 864.414/s/gpu LR: 0.000005 Logit Scale: 99.996 Contrastive_loss: 2.7506 (2.9500) Loss: 2.7506 (2.9500)
2024-02-13,00:03:57 | INFO | Start epoch 5
2024-02-13,00:04:04 | INFO | Train Epoch: 5 [ 1024/10010 (5%)] Data (t): 6.324 Batch (t): 6.699, 76.4267/s, 76.4267/s/gpu LR: 0.000005 Logit Scale: 99.996 Contrastive_loss: 2.7057 (2.7057) Loss: 2.7057 (2.7057)
2024-02-13,00:04:15 | INFO | Train Epoch: 5 [19456/10010 (100%)] Data (t): 0.089 Batch (t): 0.639, 867.588/s, 867.588/s/gpu LR: 0.000006 Logit Scale: 99.996 Contrastive_loss: 2.4321 (2.5689) Loss: 2.4321 (2.5689)
2024-02-13,00:04:16 | INFO | Start epoch 6
2024-02-13,00:04:22 | INFO | Train Epoch: 6 [ 1024/10010 (5%)] Data (t): 6.129 Batch (t): 6.509, 78.6643/s, 78.6643/s/gpu LR: 0.000006 Logit Scale: 99.996 Contrastive_loss: 2.3407 (2.3407) Loss: 2.3407 (2.3407)
2024-02-13,00:04:33 | INFO | Train Epoch: 6 [19456/10010 (100%)] Data (t): 0.013 Batch (t): 0.585, 866.873/s, 866.873/s/gpu LR: 0.000006 Logit Scale: 99.997 Contrastive_loss: 2.2057 (2.2732) Loss: 2.2057 (2.2732)
2024-02-13,00:04:33 | INFO | Start epoch 7
2024-02-13,00:04:40 | INFO | Train Epoch: 7 [ 1024/10010 (5%)] Data (t): 6.374 Batch (t): 6.750, 75.8483/s, 75.8483/s/gpu LR: 0.000007 Logit Scale: 99.997 Contrastive_loss: 1.9728 (1.9728) Loss: 1.9728 (1.9728)
2024-02-13,00:04:51 | INFO | Train Epoch: 7 [19456/10010 (100%)] Data (t): 0.070 Batch (t): 0.629, 865.081/s, 865.081/s/gpu LR: 0.000007 Logit Scale: 99.998 Contrastive_loss: 1.8460 (1.9094) Loss: 1.8460 (1.9094)
2024-02-13,00:04:52 | INFO | Start epoch 8
2024-02-13,00:04:59 | INFO | Train Epoch: 8 [ 1024/10010 (5%)] Data (t): 6.348 Batch (t): 6.724, 76.1504/s, 76.1504/s/gpu LR: 0.000007 Logit Scale: 99.999 Contrastive_loss: 1.6491 (1.6491) Loss: 1.6491 (1.6491)
2024-02-13,00:05:09 | INFO | Train Epoch: 8 [19456/10010 (100%)] Data (t): 0.039 Batch (t): 0.591, 864.632/s, 864.632/s/gpu LR: 0.000008 Logit Scale: 100.000 Contrastive_loss: 1.5005 (1.5748) Loss: 1.5005 (1.5748)
2024-02-13,00:05:10 | INFO | Start epoch 9
2024-02-13,00:05:16 | INFO | Train Epoch: 9 [ 1024/10010 (5%)] Data (t): 5.740 Batch (t): 6.101, 83.9270/s, 83.9270/s/gpu LR: 0.000008 Logit Scale: 100.000 Contrastive_loss: 1.2527 (1.2527) Loss: 1.2527 (1.2527)
2024-02-13,00:05:27 | INFO | Train Epoch: 9 [19456/10010 (100%)] Data (t): 0.029 Batch (t): 0.592, 866.672/s, 866.672/s/gpu LR: 0.000009 Logit Scale: 100.000 Contrastive_loss: 1.1425 (1.1976) Loss: 1.1425 (1.1976)
2024-02-13,00:05:29 | INFO | Start epoch 10
2024-02-13,00:05:35 | INFO | Train Epoch: 10 [ 1024/10010 (5%)] Data (t): 5.614 Batch (t): 5.988, 85.5099/s, 85.5099/s/gpu LR: 0.000009 Logit Scale: 100.000 Contrastive_loss: 0.92603 (0.92603) Loss: 0.92603 (0.92603)
2024-02-13,00:05:46 | INFO | Train Epoch: 10 [19456/10010 (100%)] Data (t): 0.089 Batch (t): 0.636, 866.623/s, 866.623/s/gpu LR: 0.000010 Logit Scale: 100.000 Contrastive_loss: 1.0557 (0.99089) Loss: 1.0557 (0.99089)
2024-02-13,00:05:47 | INFO | Start epoch 11
2024-02-13,00:05:54 | INFO | Train Epoch: 11 [ 1024/10010 (5%)] Data (t): 6.593 Batch (t): 6.953, 73.6390/s, 73.6390/s/gpu LR: 0.000010 Logit Scale: 100.000 Contrastive_loss: 0.75542 (0.75542) Loss: 0.75542 (0.75542)
2024-02-13,00:06:05 | INFO | Train Epoch: 11 [19456/10010 (100%)] Data (t): 0.045 Batch (t): 0.595, 865.436/s, 865.436/s/gpu LR: 0.000011 Logit Scale: 100.000 Contrastive_loss: 0.74945 (0.75243) Loss: 0.74945 (0.75243)
2024-02-13,00:06:05 | INFO | Start epoch 12
2024-02-13,00:06:11 | INFO | Train Epoch: 12 [ 1024/10010 (5%)] Data (t): 5.893 Batch (t): 6.266, 81.7047/s, 81.7047/s/gpu LR: 0.000011 Logit Scale: 100.000 Contrastive_loss: 0.60686 (0.60686) Loss: 0.60686 (0.60686)
2024-02-13,00:06:22 | INFO | Train Epoch: 12 [19456/10010 (100%)] Data (t): 0.015 Batch (t): 0.587, 865.351/s, 865.351/s/gpu LR: 0.000012 Logit Scale: 100.000 Contrastive_loss: 0.62050 (0.61368) Loss: 0.62050 (0.61368)
2024-02-13,00:06:22 | INFO | Start epoch 13
2024-02-13,00:06:30 | INFO | Train Epoch: 13 [ 1024/10010 (5%)] Data (t): 6.973 Batch (t): 7.323, 69.9169/s, 69.9169/s/gpu LR: 0.000012 Logit Scale: 100.000 Contrastive_loss: 0.49629 (0.49629) Loss: 0.49629 (0.49629)
2024-02-13,00:06:41 | INFO | Train Epoch: 13 [19456/10010 (100%)] Data (t): 0.044 Batch (t): 0.595, 874.122/s, 874.122/s/gpu LR: 0.000013 Logit Scale: 100.000 Contrastive_loss: 0.53294 (0.51462) Loss: 0.53294 (0.51462)
2024-02-13,00:06:41 | INFO | Start epoch 14
2024-02-13,00:06:48 | INFO | Train Epoch: 14 [ 1024/10010 (5%)] Data (t): 6.511 Batch (t): 6.872, 74.5086/s, 74.5086/s/gpu LR: 0.000013 Logit Scale: 100.000 Contrastive_loss: 0.45596 (0.45596) Loss: 0.45596 (0.45596)
2024-02-13,00:06:59 | INFO | Train Epoch: 14 [19456/10010 (100%)] Data (t): 0.014 Batch (t): 0.587, 865.782/s, 865.782/s/gpu LR: 0.000014 Logit Scale: 100.000 Contrastive_loss: 0.41602 (0.43599) Loss: 0.41602 (0.43599)
|