2024-02-12,02:49:55 | INFO | Running with a single process. Device cuda:0. 2024-02-12,02:49:55 | INFO | Loaded ViT-B-32 model config. 2024-02-12,02:49:58 | INFO | Loading pretrained ViT-B-32 weights (laion2b_s34b_b79k). 2024-02-12,02:49:58 | INFO | Model: 2024-02-12,02:49:58 | INFO | CLIP( (visual): VisionTransformer( (conv1): Conv2d(3, 768, kernel_size=(32, 32), stride=(32, 32), bias=False) (patch_dropout): Identity() (ln_pre): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (transformer): Transformer( (resblocks): ModuleList( (0-11): 12 x ResidualAttentionBlock( (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) ) (ls_1): Identity() (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=768, out_features=3072, bias=True) (gelu): GELU(approximate='none') (c_proj): Linear(in_features=3072, out_features=768, bias=True) ) (ls_2): Identity() ) ) ) (ln_post): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (transformer): Transformer( (resblocks): ModuleList( (0-11): 12 x ResidualAttentionBlock( (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) ) (ls_1): Identity() (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=512, out_features=2048, bias=True) (gelu): GELU(approximate='none') (c_proj): Linear(in_features=2048, out_features=512, bias=True) ) (ls_2): Identity() ) ) ) (token_embedding): Embedding(49408, 512) (ln_final): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) 2024-02-12,02:49:58 | INFO | Params: 2024-02-12,02:49:58 | INFO | accum_freq: 1 2024-02-12,02:49:58 | INFO | aug_cfg: {} 2024-02-12,02:49:58 | INFO | batch_size: 256 2024-02-12,02:49:58 | INFO | beta1: 0.9 2024-02-12,02:49:58 | INFO | beta2: 0.98 2024-02-12,02:49:58 | INFO | checkpoint_path: ./logs/2024_02_12-02_49_55-model_ViT-B-32-lr_1e-05-b_256-j_8-p_amp_bf16/checkpoints 2024-02-12,02:49:58 | INFO | coca_caption_loss_weight: 2.0 2024-02-12,02:49:58 | INFO | coca_contrastive_loss_weight: 1.0 2024-02-12,02:49:58 | INFO | copy_codebase: False 2024-02-12,02:49:58 | INFO | csv_caption_key: captions 2024-02-12,02:49:58 | INFO | csv_img_key: images 2024-02-12,02:49:58 | INFO | csv_separator: 2024-02-12,02:49:58 | INFO | dataset_resampled: False 2024-02-12,02:49:58 | INFO | dataset_type: auto 2024-02-12,02:49:58 | INFO | ddp_static_graph: True 2024-02-12,02:49:58 | INFO | debug: False 2024-02-12,02:49:58 | INFO | delete_previous_checkpoint: False 2024-02-12,02:49:58 | INFO | device: cuda:0 2024-02-12,02:49:58 | INFO | dist_backend: nccl 2024-02-12,02:49:58 | INFO | dist_url: env:// 2024-02-12,02:49:58 | INFO | distill: False 2024-02-12,02:49:58 | INFO | distill_model: None 2024-02-12,02:49:58 | INFO | distill_pretrained: None 2024-02-12,02:49:58 | INFO | distributed: False 2024-02-12,02:49:58 | INFO | epochs: 5 2024-02-12,02:49:58 | INFO | epochs_cooldown: None 2024-02-12,02:49:58 | INFO | eps: 1e-06 2024-02-12,02:49:58 | INFO | force_custom_text: False 2024-02-12,02:49:58 | INFO | force_image_size: None 2024-02-12,02:49:58 | INFO | force_patch_dropout: None 2024-02-12,02:49:58 | INFO | force_quick_gelu: False 2024-02-12,02:49:58 | INFO | gather_with_grad: True 2024-02-12,02:49:58 | INFO | grad_checkpointing: False 2024-02-12,02:49:58 | INFO | grad_clip_norm: None 2024-02-12,02:49:58 | INFO | horovod: False 2024-02-12,02:49:58 | INFO | image_interpolation: None 2024-02-12,02:49:58 | INFO | image_mean: None 2024-02-12,02:49:58 | INFO | image_resize_mode: None 2024-02-12,02:49:58 | INFO | image_std: None 2024-02-12,02:49:58 | INFO | imagenet_v2: None 2024-02-12,02:49:58 | INFO | imagenet_val: None 2024-02-12,02:49:58 | INFO | local_loss: True 2024-02-12,02:49:58 | INFO | local_rank: 0 2024-02-12,02:49:58 | INFO | lock_image: False 2024-02-12,02:49:58 | INFO | lock_image_freeze_bn_stats: False 2024-02-12,02:49:58 | INFO | lock_image_unlocked_groups: 0 2024-02-12,02:49:58 | INFO | lock_text: False 2024-02-12,02:49:58 | INFO | lock_text_freeze_layer_norm: False 2024-02-12,02:49:58 | INFO | lock_text_unlocked_layers: 0 2024-02-12,02:49:58 | INFO | log_every_n_steps: 100 2024-02-12,02:49:58 | INFO | log_level: 20 2024-02-12,02:49:58 | INFO | log_local: False 2024-02-12,02:49:58 | INFO | log_path: ./logs/2024_02_12-02_49_55-model_ViT-B-32-lr_1e-05-b_256-j_8-p_amp_bf16/out.log 2024-02-12,02:49:58 | INFO | logs: ./logs/ 2024-02-12,02:49:58 | INFO | lr: 1e-05 2024-02-12,02:49:58 | INFO | lr_cooldown_end: 0.0 2024-02-12,02:49:58 | INFO | lr_cooldown_power: 1.0 2024-02-12,02:49:58 | INFO | lr_scheduler: cosine 2024-02-12,02:49:58 | INFO | model: ViT-B-32 2024-02-12,02:49:58 | INFO | name: 2024_02_12-02_49_55-model_ViT-B-32-lr_1e-05-b_256-j_8-p_amp_bf16 2024-02-12,02:49:58 | INFO | no_set_device_rank: False 2024-02-12,02:49:58 | INFO | precision: amp_bf16 2024-02-12,02:49:58 | INFO | pretrained: laion2b_s34b_b79k 2024-02-12,02:49:58 | INFO | pretrained_image: False 2024-02-12,02:49:58 | INFO | rank: 0 2024-02-12,02:49:58 | INFO | remote_sync: None 2024-02-12,02:49:58 | INFO | remote_sync_frequency: 300 2024-02-12,02:49:58 | INFO | remote_sync_protocol: s3 2024-02-12,02:49:58 | INFO | report_to: 2024-02-12,02:49:58 | INFO | resume: None 2024-02-12,02:49:58 | INFO | save_frequency: 5 2024-02-12,02:49:58 | INFO | save_most_recent: False 2024-02-12,02:49:58 | INFO | seed: 0 2024-02-12,02:49:58 | INFO | siglip: False 2024-02-12,02:49:58 | INFO | skip_scheduler: False 2024-02-12,02:49:58 | INFO | tensorboard: False 2024-02-12,02:49:58 | INFO | tensorboard_path: 2024-02-12,02:49:58 | INFO | torchcompile: False 2024-02-12,02:49:58 | INFO | torchscript: False 2024-02-12,02:49:58 | INFO | trace: False 2024-02-12,02:49:58 | INFO | train_data: ../../train_data_counterfactuals_neg_clip2.csv 2024-02-12,02:49:58 | INFO | train_data_upsampling_factors: None 2024-02-12,02:49:58 | INFO | train_num_samples: None 2024-02-12,02:49:58 | INFO | use_bn_sync: False 2024-02-12,02:49:58 | INFO | use_bnb_linear: None 2024-02-12,02:49:58 | INFO | val_data: None 2024-02-12,02:49:58 | INFO | val_frequency: 5 2024-02-12,02:49:58 | INFO | val_num_samples: None 2024-02-12,02:49:58 | INFO | wandb: False 2024-02-12,02:49:58 | INFO | wandb_notes: 2024-02-12,02:49:58 | INFO | wandb_project_name: open-clip 2024-02-12,02:49:58 | INFO | warmup: 1024 2024-02-12,02:49:58 | INFO | wd: 0.2 2024-02-12,02:49:58 | INFO | workers: 8 2024-02-12,02:49:58 | INFO | world_size: 1 2024-02-12,02:49:58 | INFO | zeroshot_frequency: 5 2024-02-12,02:49:58 | INFO | Start epoch 0 2024-02-12,02:50:15 | INFO | Train Epoch: 0 [ 1024/27087 (1%)] Data (t): 12.525 Batch (t): 16.592, 15.4295/s, 15.4295/s/gpu LR: 0.000000 Logit Scale: 100.000 Contrastive_loss: 1.0551 (1.0551) Loss: 1.0551 (1.0551) 2024-02-12,02:52:13 | INFO | Train Epoch: 0 [103424/27087 (96%)] Data (t): 0.645 Batch (t): 1.175, 459.500/s, 459.500/s/gpu LR: 0.000001 Logit Scale: 99.996 Contrastive_loss: 0.80440 (0.92975) Loss: 0.80440 (0.92975) 2024-02-12,02:52:20 | INFO | Train Epoch: 0 [107520/27087 (100%)] Data (t): 1.439 Batch (t): 1.884, 43.6989/s, 43.6989/s/gpu LR: 0.000001 Logit Scale: 99.996 Contrastive_loss: 0.73623 (0.86524) Loss: 0.73623 (0.86524) 2024-02-12,02:52:21 | INFO | Start epoch 1 2024-02-12,02:52:33 | INFO | Train Epoch: 1 [ 1024/27087 (1%)] Data (t): 11.817 Batch (t): 12.154, 21.0639/s, 21.0639/s/gpu LR: 0.000001 Logit Scale: 99.995 Contrastive_loss: 0.75390 (0.75390) Loss: 0.75390 (0.75390) 2024-02-12,02:54:37 | INFO | Train Epoch: 1 [103424/27087 (96%)] Data (t): 0.740 Batch (t): 1.238, 460.135/s, 460.135/s/gpu LR: 0.000002 Logit Scale: 99.988 Contrastive_loss: 0.65958 (0.70674) Loss: 0.65958 (0.70674) 2024-02-12,02:54:39 | INFO | Train Epoch: 1 [107520/27087 (100%)] Data (t): 0.058 Batch (t): 0.557, 459.304/s, 459.304/s/gpu LR: 0.000002 Logit Scale: 99.988 Contrastive_loss: 0.64635 (0.68661) Loss: 0.64635 (0.68661) 2024-02-12,02:54:39 | INFO | Start epoch 2 2024-02-12,02:54:51 | INFO | Train Epoch: 2 [ 1024/27087 (1%)] Data (t): 11.166 Batch (t): 11.505, 22.2512/s, 22.2512/s/gpu LR: 0.000002 Logit Scale: 99.988 Contrastive_loss: 0.53999 (0.53999) Loss: 0.53999 (0.53999) 2024-02-12,02:56:51 | INFO | Train Epoch: 2 [103424/27087 (96%)] Data (t): 0.696 Batch (t): 1.195, 459.292/s, 459.292/s/gpu LR: 0.000003 Logit Scale: 99.983 Contrastive_loss: 0.56759 (0.55379) Loss: 0.56759 (0.55379) 2024-02-12,02:56:54 | INFO | Train Epoch: 2 [107520/27087 (100%)] Data (t): 0.387 Batch (t): 0.888, 457.597/s, 457.597/s/gpu LR: 0.000003 Logit Scale: 99.983 Contrastive_loss: 0.48756 (0.53171) Loss: 0.48756 (0.53171) 2024-02-12,02:56:55 | INFO | Start epoch 3 2024-02-12,02:57:07 | INFO | Train Epoch: 3 [ 1024/27087 (1%)] Data (t): 11.677 Batch (t): 12.022, 21.2941/s, 21.2941/s/gpu LR: 0.000003 Logit Scale: 99.983 Contrastive_loss: 0.44987 (0.44987) Loss: 0.44987 (0.44987) 2024-02-12,02:59:10 | INFO | Train Epoch: 3 [103424/27087 (96%)] Data (t): 0.718 Batch (t): 1.230, 459.886/s, 459.886/s/gpu LR: 0.000004 Logit Scale: 99.981 Contrastive_loss: 0.42789 (0.43888) Loss: 0.42789 (0.43888) 2024-02-12,02:59:12 | INFO | Train Epoch: 3 [107520/27087 (100%)] Data (t): 0.058 Batch (t): 0.558, 459.170/s, 459.170/s/gpu LR: 0.000004 Logit Scale: 99.980 Contrastive_loss: 0.42664 (0.43480) Loss: 0.42664 (0.43480) 2024-02-12,02:59:12 | INFO | Start epoch 4 2024-02-12,02:59:24 | INFO | Train Epoch: 4 [ 1024/27087 (1%)] Data (t): 11.325 Batch (t): 11.659, 21.9575/s, 21.9575/s/gpu LR: 0.000004 Logit Scale: 99.980 Contrastive_loss: 0.34311 (0.34311) Loss: 0.34311 (0.34311) 2024-02-12,03:01:24 | INFO | Train Epoch: 4 [103424/27087 (96%)] Data (t): 0.712 Batch (t): 1.198, 459.840/s, 459.840/s/gpu LR: 0.000005 Logit Scale: 99.989 Contrastive_loss: 0.32785 (0.33548) Loss: 0.32785 (0.33548) 2024-02-12,03:01:27 | INFO | Train Epoch: 4 [107520/27087 (100%)] Data (t): 0.180 Batch (t): 0.623, 313.004/s, 313.004/s/gpu LR: 0.000005 Logit Scale: 99.989 Contrastive_loss: 0.36298 (0.34464) Loss: 0.36298 (0.34464)