2024-09-06,22:41:22 | INFO | No latest resume checkpoint found in /home/breaking_0.1_trained/10_most_difficult/checkpoints.
2024-09-06,22:41:24 | INFO | Running in distributed mode with multiple processes. Device: cuda:0.Process (global: 0, local 0), total 2.
2024-09-06,22:41:24 | INFO | Loaded ViT-B-32 model config.
2024-09-06,22:41:25 | INFO | Model:
2024-09-06,22:41:25 | INFO | CLIP(
  (visual): VisionTransformer(
    (patchnorm_pre_ln): Identity()
    (conv1): Conv2d(3, 768, kernel_size=(32, 32), stride=(32, 32), bias=False)
    (patch_dropout): Identity()
    (ln_pre): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    (transformer): Transformer(
      (resblocks): ModuleList(
        (0): ResidualAttentionBlock(
          (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (attn): MultiheadAttention(
            (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
          )
          (ls_1): Identity()
          (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (mlp): Sequential(
            (c_fc): Linear(in_features=768, out_features=3072, bias=True)
            (gelu): GELU(approximate='none')
            (c_proj): Linear(in_features=3072, out_features=768, bias=True)
          )
          (ls_2): Identity()
        )
        (1): ResidualAttentionBlock(
          (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (attn): MultiheadAttention(
            (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
          )
          (ls_1): Identity()
          (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (mlp): Sequential(
            (c_fc): Linear(in_features=768, out_features=3072, bias=True)
            (gelu): GELU(approximate='none')
            (c_proj): Linear(in_features=3072, out_features=768, bias=True)
          )
          (ls_2): Identity()
        )
        (2): ResidualAttentionBlock(
          (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (attn): MultiheadAttention(
            (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
          )
          (ls_1): Identity()
          (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (mlp): Sequential(
            (c_fc): Linear(in_features=768, out_features=3072, bias=True)
            (gelu): GELU(approximate='none')
            (c_proj): Linear(in_features=3072, out_features=768, bias=True)
          )
          (ls_2): Identity()
        )
        (3): ResidualAttentionBlock(
          (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (attn): MultiheadAttention(
            (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
          )
          (ls_1): Identity()
          (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (mlp): Sequential(
            (c_fc): Linear(in_features=768, out_features=3072, bias=True)
            (gelu): GELU(approximate='none')
            (c_proj): Linear(in_features=3072, out_features=768, bias=True)
          )
          (ls_2): Identity()
        )
        (4): ResidualAttentionBlock(
          (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (attn): MultiheadAttention(
            (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
          )
          (ls_1): Identity()
          (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (mlp): Sequential(
            (c_fc): Linear(in_features=768, out_features=3072, bias=True)
            (gelu): GELU(approximate='none')
            (c_proj): Linear(in_features=3072, out_features=768, bias=True)
          )
          (ls_2): Identity()
        )
        (5): ResidualAttentionBlock(
          (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (attn): MultiheadAttention(
            (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
          )
          (ls_1): Identity()
          (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (mlp): Sequential(
            (c_fc): Linear(in_features=768, out_features=3072, bias=True)
            (gelu): GELU(approximate='none')
            (c_proj): Linear(in_features=3072, out_features=768, bias=True)
          )
          (ls_2): Identity()
        )
        (6): ResidualAttentionBlock(
          (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (attn): MultiheadAttention(
            (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
          )
          (ls_1): Identity()
          (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (mlp): Sequential(
            (c_fc): Linear(in_features=768, out_features=3072, bias=True)
            (gelu): GELU(approximate='none')
            (c_proj): Linear(in_features=3072, out_features=768, bias=True)
          )
          (ls_2): Identity()
        )
        (7): ResidualAttentionBlock(
          (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (attn): MultiheadAttention(
            (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
          )
          (ls_1): Identity()
          (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (mlp): Sequential(
            (c_fc): Linear(in_features=768, out_features=3072, bias=True)
            (gelu): GELU(approximate='none')
            (c_proj): Linear(in_features=3072, out_features=768, bias=True)
          )
          (ls_2): Identity()
        )
        (8): ResidualAttentionBlock(
          (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (attn): MultiheadAttention(
            (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
          )
          (ls_1): Identity()
          (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (mlp): Sequential(
            (c_fc): Linear(in_features=768, out_features=3072, bias=True)
            (gelu): GELU(approximate='none')
            (c_proj): Linear(in_features=3072, out_features=768, bias=True)
          )
          (ls_2): Identity()
        )
        (9): ResidualAttentionBlock(
          (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (attn): MultiheadAttention(
            (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
          )
          (ls_1): Identity()
          (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (mlp): Sequential(
            (c_fc): Linear(in_features=768, out_features=3072, bias=True)
            (gelu): GELU(approximate='none')
            (c_proj): Linear(in_features=3072, out_features=768, bias=True)
          )
          (ls_2): Identity()
        )
        (10): ResidualAttentionBlock(
          (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (attn): MultiheadAttention(
            (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
          )
          (ls_1): Identity()
          (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (mlp): Sequential(
            (c_fc): Linear(in_features=768, out_features=3072, bias=True)
            (gelu): GELU(approximate='none')
            (c_proj): Linear(in_features=3072, out_features=768, bias=True)
          )
          (ls_2): Identity()
        )
        (11): ResidualAttentionBlock(
          (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (attn): MultiheadAttention(
            (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
          )
          (ls_1): Identity()
          (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (mlp): Sequential(
            (c_fc): Linear(in_features=768, out_features=3072, bias=True)
            (gelu): GELU(approximate='none')
            (c_proj): Linear(in_features=3072, out_features=768, bias=True)
          )
          (ls_2): Identity()
        )
      )
    )
    (ln_post): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (transformer): Transformer(
    (resblocks): ModuleList(
      (0): ResidualAttentionBlock(
        (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (attn): MultiheadAttention(
          (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True)
        )
        (ls_1): Identity()
        (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (mlp): Sequential(
          (c_fc): Linear(in_features=512, out_features=2048, bias=True)
          (gelu): GELU(approximate='none')
          (c_proj): Linear(in_features=2048, out_features=512, bias=True)
        )
        (ls_2): Identity()
      )
      (1): ResidualAttentionBlock(
        (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (attn): MultiheadAttention(
          (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True)
        )
        (ls_1): Identity()
        (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (mlp): Sequential(
          (c_fc): Linear(in_features=512, out_features=2048, bias=True)
          (gelu): GELU(approximate='none')
          (c_proj): Linear(in_features=2048, out_features=512, bias=True)
        )
        (ls_2): Identity()
      )
      (2): ResidualAttentionBlock(
        (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (attn): MultiheadAttention(
          (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True)
        )
        (ls_1): Identity()
        (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (mlp): Sequential(
          (c_fc): Linear(in_features=512, out_features=2048, bias=True)
          (gelu): GELU(approximate='none')
          (c_proj): Linear(in_features=2048, out_features=512, bias=True)
        )
        (ls_2): Identity()
      )
      (3): ResidualAttentionBlock(
        (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (attn): MultiheadAttention(
          (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True)
        )
        (ls_1): Identity()
        (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (mlp): Sequential(
          (c_fc): Linear(in_features=512, out_features=2048, bias=True)
          (gelu): GELU(approximate='none')
          (c_proj): Linear(in_features=2048, out_features=512, bias=True)
        )
        (ls_2): Identity()
      )
      (4): ResidualAttentionBlock(
        (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (attn): MultiheadAttention(
          (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True)
        )
        (ls_1): Identity()
        (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (mlp): Sequential(
          (c_fc): Linear(in_features=512, out_features=2048, bias=True)
          (gelu): GELU(approximate='none')
          (c_proj): Linear(in_features=2048, out_features=512, bias=True)
        )
        (ls_2): Identity()
      )
      (5): ResidualAttentionBlock(
        (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (attn): MultiheadAttention(
          (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True)
        )
        (ls_1): Identity()
        (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (mlp): Sequential(
          (c_fc): Linear(in_features=512, out_features=2048, bias=True)
          (gelu): GELU(approximate='none')
          (c_proj): Linear(in_features=2048, out_features=512, bias=True)
        )
        (ls_2): Identity()
      )
      (6): ResidualAttentionBlock(
        (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (attn): MultiheadAttention(
          (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True)
        )
        (ls_1): Identity()
        (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (mlp): Sequential(
          (c_fc): Linear(in_features=512, out_features=2048, bias=True)
          (gelu): GELU(approximate='none')
          (c_proj): Linear(in_features=2048, out_features=512, bias=True)
        )
        (ls_2): Identity()
      )
      (7): ResidualAttentionBlock(
        (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (attn): MultiheadAttention(
          (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True)
        )
        (ls_1): Identity()
        (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (mlp): Sequential(
          (c_fc): Linear(in_features=512, out_features=2048, bias=True)
          (gelu): GELU(approximate='none')
          (c_proj): Linear(in_features=2048, out_features=512, bias=True)
        )
        (ls_2): Identity()
      )
      (8): ResidualAttentionBlock(
        (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (attn): MultiheadAttention(
          (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True)
        )
        (ls_1): Identity()
        (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (mlp): Sequential(
          (c_fc): Linear(in_features=512, out_features=2048, bias=True)
          (gelu): GELU(approximate='none')
          (c_proj): Linear(in_features=2048, out_features=512, bias=True)
        )
        (ls_2): Identity()
      )
      (9): ResidualAttentionBlock(
        (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (attn): MultiheadAttention(
          (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True)
        )
        (ls_1): Identity()
        (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (mlp): Sequential(
          (c_fc): Linear(in_features=512, out_features=2048, bias=True)
          (gelu): GELU(approximate='none')
          (c_proj): Linear(in_features=2048, out_features=512, bias=True)
        )
        (ls_2): Identity()
      )
      (10): ResidualAttentionBlock(
        (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (attn): MultiheadAttention(
          (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True)
        )
        (ls_1): Identity()
        (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (mlp): Sequential(
          (c_fc): Linear(in_features=512, out_features=2048, bias=True)
          (gelu): GELU(approximate='none')
          (c_proj): Linear(in_features=2048, out_features=512, bias=True)
        )
        (ls_2): Identity()
      )
      (11): ResidualAttentionBlock(
        (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (attn): MultiheadAttention(
          (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True)
        )
        (ls_1): Identity()
        (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (mlp): Sequential(
          (c_fc): Linear(in_features=512, out_features=2048, bias=True)
          (gelu): GELU(approximate='none')
          (c_proj): Linear(in_features=2048, out_features=512, bias=True)
        )
        (ls_2): Identity()
      )
    )
  )
  (token_embedding): Embedding(49408, 512)
  (ln_final): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
)
2024-09-06,22:41:25 | INFO | Params:
2024-09-06,22:41:25 | INFO |   accum_freq: 1
2024-09-06,22:41:25 | INFO |   aug_cfg: {}
2024-09-06,22:41:25 | INFO |   batch_size: 2048
2024-09-06,22:41:25 | INFO |   beta1: 0.9
2024-09-06,22:41:25 | INFO |   beta2: 0.98
2024-09-06,22:41:25 | INFO |   checkpoint_path: /home/breaking_0.1_trained/10_most_difficult/checkpoints
2024-09-06,22:41:25 | INFO |   coca_caption_loss_weight: 2.0
2024-09-06,22:41:25 | INFO |   coca_contrastive_loss_weight: 1.0
2024-09-06,22:41:25 | INFO |   copy_codebase: False
2024-09-06,22:41:25 | INFO |   csv_caption_key: title
2024-09-06,22:41:25 | INFO |   csv_img_key: filepath
2024-09-06,22:41:25 | INFO |   csv_separator: 	
2024-09-06,22:41:25 | INFO |   dataset_resampled: True
2024-09-06,22:41:25 | INFO |   dataset_type: webdataset
2024-09-06,22:41:25 | INFO |   ddp_static_graph: True
2024-09-06,22:41:25 | INFO |   debug: False
2024-09-06,22:41:25 | INFO |   delete_previous_checkpoint: False
2024-09-06,22:41:25 | INFO |   device: cuda:0
2024-09-06,22:41:25 | INFO |   dist_backend: nccl
2024-09-06,22:41:25 | INFO |   dist_url: env://
2024-09-06,22:41:25 | INFO |   distill: False
2024-09-06,22:41:25 | INFO |   distill_model: None
2024-09-06,22:41:25 | INFO |   distill_pretrained: None
2024-09-06,22:41:25 | INFO |   distributed: True
2024-09-06,22:41:25 | INFO |   epochs: 5
2024-09-06,22:41:25 | INFO |   epochs_cooldown: None
2024-09-06,22:41:25 | INFO |   eps: 1e-06
2024-09-06,22:41:25 | INFO |   force_custom_text: False
2024-09-06,22:41:25 | INFO |   force_image_size: None
2024-09-06,22:41:25 | INFO |   force_patch_dropout: None
2024-09-06,22:41:25 | INFO |   force_quick_gelu: False
2024-09-06,22:41:25 | INFO |   gather_with_grad: True
2024-09-06,22:41:25 | INFO |   grad_checkpointing: True
2024-09-06,22:41:25 | INFO |   grad_clip_norm: None
2024-09-06,22:41:25 | INFO |   horovod: False
2024-09-06,22:41:25 | INFO |   image_mean: None
2024-09-06,22:41:25 | INFO |   image_std: None
2024-09-06,22:41:25 | INFO |   imagenet_v2: None
2024-09-06,22:41:25 | INFO |   imagenet_val: None
2024-09-06,22:41:25 | INFO |   local_loss: True
2024-09-06,22:41:25 | INFO |   local_rank: 0
2024-09-06,22:41:25 | INFO |   lock_image: False
2024-09-06,22:41:25 | INFO |   lock_image_freeze_bn_stats: False
2024-09-06,22:41:25 | INFO |   lock_image_unlocked_groups: 0
2024-09-06,22:41:25 | INFO |   lock_text: False
2024-09-06,22:41:25 | INFO |   lock_text_freeze_layer_norm: False
2024-09-06,22:41:25 | INFO |   lock_text_unlocked_layers: 0
2024-09-06,22:41:25 | INFO |   log_every_n_steps: 100
2024-09-06,22:41:25 | INFO |   log_level: 20
2024-09-06,22:41:25 | INFO |   log_local: False
2024-09-06,22:41:25 | INFO |   log_path: /home/breaking_0.1_trained/10_most_difficult/out.log
2024-09-06,22:41:25 | INFO |   logs: /home/breaking_0.1_trained
2024-09-06,22:41:25 | INFO |   lr: 0.0005
2024-09-06,22:41:25 | INFO |   lr_cooldown_end: 0.0
2024-09-06,22:41:25 | INFO |   lr_cooldown_power: 1.0
2024-09-06,22:41:25 | INFO |   lr_scheduler: cosine
2024-09-06,22:41:25 | INFO |   model: ViT-B-32
2024-09-06,22:41:25 | INFO |   name: 10_most_difficult
2024-09-06,22:41:25 | INFO |   no_set_device_rank: False
2024-09-06,22:41:25 | INFO |   precision: amp
2024-09-06,22:41:25 | INFO |   pretrained: 
2024-09-06,22:41:25 | INFO |   pretrained_image: False
2024-09-06,22:41:25 | INFO |   rank: 0
2024-09-06,22:41:25 | INFO |   remote_sync: None
2024-09-06,22:41:25 | INFO |   remote_sync_frequency: 300
2024-09-06,22:41:25 | INFO |   remote_sync_protocol: s3
2024-09-06,22:41:25 | INFO |   report_to: wandb
2024-09-06,22:41:25 | INFO |   resume: None
2024-09-06,22:41:25 | INFO |   save_frequency: 0
2024-09-06,22:41:25 | INFO |   save_most_recent: True
2024-09-06,22:41:25 | INFO |   seed: 0
2024-09-06,22:41:25 | INFO |   skip_scheduler: False
2024-09-06,22:41:25 | INFO |   tensorboard: False
2024-09-06,22:41:25 | INFO |   tensorboard_path: 
2024-09-06,22:41:25 | INFO |   torchscript: False
2024-09-06,22:41:25 | INFO |   trace: False
2024-09-06,22:41:25 | INFO |   train_data: /home/breaking_0.1/{00000000..00000127}.tar
2024-09-06,22:41:25 | INFO |   train_data_upsampling_factors: None
2024-09-06,22:41:25 | INFO |   train_num_samples: 2560000
2024-09-06,22:41:25 | INFO |   use_bn_sync: False
2024-09-06,22:41:25 | INFO |   val_data: None
2024-09-06,22:41:25 | INFO |   val_frequency: 1
2024-09-06,22:41:25 | INFO |   val_num_samples: None
2024-09-06,22:41:25 | INFO |   wandb: True
2024-09-06,22:41:25 | INFO |   wandb_notes: 
2024-09-06,22:41:25 | INFO |   wandb_project_name: clip_text_hq_clusters
2024-09-06,22:41:25 | INFO |   warmup: 500
2024-09-06,22:41:25 | INFO |   wd: 0.2
2024-09-06,22:41:25 | INFO |   workers: 4
2024-09-06,22:41:25 | INFO |   world_size: 2
2024-09-06,22:41:25 | INFO |   zeroshot_frequency: 2
2024-09-06,22:41:34 | INFO | Start epoch 0
2024-09-06,22:41:51 | INFO | Train Epoch: 0 [   4096/2572288 (0%)] Data (t): 11.911 Batch (t): 16.649, 246.023/s, 123.011/s/gpu LR: 0.000001 Logit Scale: 14.286 Contrastive_loss: 8.3776 (8.3776) Loss: 8.3776 (8.3776)
2024-09-06,22:41:54 | INFO | Reducer buckets have been rebuilt in this iteration.
2024-09-06,22:46:11 | INFO | Train Epoch: 0 [ 413696/2572288 (16%)] Data (t): 0.555 Batch (t): 2.608, 1572.92/s, 786.459/s/gpu LR: 0.000101 Logit Scale: 14.264 Contrastive_loss: 8.2202 (8.2989) Loss: 8.2202 (8.2989)
2024-09-06,22:50:33 | INFO | Train Epoch: 0 [ 823296/2572288 (32%)] Data (t): 0.568 Batch (t): 2.616, 1572.28/s, 786.140/s/gpu LR: 0.000201 Logit Scale: 14.244 Contrastive_loss: 7.9768 (8.1915) Loss: 7.9768 (8.1915)
2024-09-06,22:54:55 | INFO | Train Epoch: 0 [1232896/2572288 (48%)] Data (t): 0.570 Batch (t): 2.618, 1560.51/s, 780.257/s/gpu LR: 0.000301 Logit Scale: 14.227 Contrastive_loss: 7.9563 (8.1327) Loss: 7.9563 (8.1327)
2024-09-06,22:59:17 | INFO | Train Epoch: 0 [1642496/2572288 (64%)] Data (t): 0.574 Batch (t): 2.623, 1564.50/s, 782.249/s/gpu LR: 0.000401 Logit Scale: 14.205 Contrastive_loss: 7.9317 (8.0925) Loss: 7.9317 (8.0925)
2024-09-06,23:03:39 | INFO | Train Epoch: 0 [2052096/2572288 (80%)] Data (t): 0.571 Batch (t): 2.620, 1561.55/s, 780.776/s/gpu LR: 0.000500 Logit Scale: 14.192 Contrastive_loss: 7.7846 (8.0412) Loss: 7.7846 (8.0412)
2024-09-06,23:08:01 | INFO | Train Epoch: 0 [2461696/2572288 (96%)] Data (t): 0.569 Batch (t): 2.619, 1563.18/s, 781.589/s/gpu LR: 0.000498 Logit Scale: 14.198 Contrastive_loss: 6.8264 (7.8676) Loss: 6.8264 (7.8676)
2024-09-06,23:09:12 | INFO | Train Epoch: 0 [2572288/2572288 (100%)] Data (t): 0.569 Batch (t): 2.618, 1573.70/s, 786.852/s/gpu LR: 0.000497 Logit Scale: 14.199 Contrastive_loss: 7.5857 (7.8324) Loss: 7.5857 (7.8324)
2024-09-06,23:09:14 | INFO | Start epoch 1
2024-09-06,23:09:26 | INFO | Train Epoch: 1 [   4096/2572288 (0%)] Data (t): 9.742 Batch (t): 11.787, 347.508/s, 173.754/s/gpu LR: 0.000497 Logit Scale: 14.199 Contrastive_loss: 7.8159 (7.8159) Loss: 7.8159 (7.8159)
2024-09-06,23:13:45 | INFO | Train Epoch: 1 [ 413696/2572288 (16%)] Data (t): 0.536 Batch (t): 2.594, 1566.93/s, 783.467/s/gpu LR: 0.000491 Logit Scale: 14.223 Contrastive_loss: 6.1751 (6.9955) Loss: 6.1751 (6.9955)
2024-09-06,23:18:08 | INFO | Train Epoch: 1 [ 823296/2572288 (32%)] Data (t): 0.571 Batch (t): 2.622, 1565.08/s, 782.541/s/gpu LR: 0.000481 Logit Scale: 14.267 Contrastive_loss: 6.8320 (6.9410) Loss: 6.8320 (6.9410)
2024-09-06,23:22:30 | INFO | Train Epoch: 1 [1232896/2572288 (48%)] Data (t): 0.571 Batch (t): 2.621, 1560.41/s, 780.206/s/gpu LR: 0.000468 Logit Scale: 14.319 Contrastive_loss: 6.7536 (6.8941) Loss: 6.7536 (6.8941)
2024-09-06,23:26:52 | INFO | Train Epoch: 1 [1642496/2572288 (64%)] Data (t): 0.572 Batch (t): 2.625, 1564.15/s, 782.075/s/gpu LR: 0.000452 Logit Scale: 14.415 Contrastive_loss: 7.1712 (6.9495) Loss: 7.1712 (6.9495)
2024-09-06,23:31:14 | INFO | Train Epoch: 1 [2052096/2572288 (80%)] Data (t): 0.567 Batch (t): 2.618, 1564.96/s, 782.479/s/gpu LR: 0.000433 Logit Scale: 14.531 Contrastive_loss: 7.3828 (7.0218) Loss: 7.3828 (7.0218)
2024-09-06,23:35:36 | INFO | Train Epoch: 1 [2461696/2572288 (96%)] Data (t): 0.570 Batch (t): 2.619, 1559.05/s, 779.526/s/gpu LR: 0.000412 Logit Scale: 14.685 Contrastive_loss: 5.3914 (6.7888) Loss: 5.3914 (6.7888)
2024-09-06,23:36:47 | INFO | Train Epoch: 1 [2572288/2572288 (100%)] Data (t): 0.569 Batch (t): 2.618, 1573.02/s, 786.512/s/gpu LR: 0.000406 Logit Scale: 14.750 Contrastive_loss: 6.8457 (6.7959) Loss: 6.8457 (6.7959)
2024-09-06,23:36:49 | INFO | Start epoch 2
2024-09-06,23:37:01 | INFO | Train Epoch: 2 [   4096/2572288 (0%)] Data (t): 9.683 Batch (t): 11.729, 349.234/s, 174.617/s/gpu LR: 0.000405 Logit Scale: 14.752 Contrastive_loss: 3.8208 (3.8208) Loss: 3.8208 (3.8208)
2024-09-06,23:41:21 | INFO | Train Epoch: 2 [ 413696/2572288 (16%)] Data (t): 0.539 Batch (t): 2.596, 1563.69/s, 781.844/s/gpu LR: 0.000381 Logit Scale: 14.930 Contrastive_loss: 5.3136 (4.5672) Loss: 5.3136 (4.5672)
2024-09-06,23:45:42 | INFO | Train Epoch: 2 [ 823296/2572288 (32%)] Data (t): 0.564 Batch (t): 2.615, 1565.33/s, 782.664/s/gpu LR: 0.000355 Logit Scale: 15.116 Contrastive_loss: 6.5801 (5.2382) Loss: 6.5801 (5.2382)
2024-09-06,23:50:04 | INFO | Train Epoch: 2 [1232896/2572288 (48%)] Data (t): 0.571 Batch (t): 2.621, 1569.26/s, 784.629/s/gpu LR: 0.000327 Logit Scale: 15.352 Contrastive_loss: 3.4862 (4.8002) Loss: 3.4862 (4.8002)
2024-09-06,23:54:26 | INFO | Train Epoch: 2 [1642496/2572288 (64%)] Data (t): 0.569 Batch (t): 2.619, 1571.61/s, 785.806/s/gpu LR: 0.000298 Logit Scale: 15.478 Contrastive_loss: 6.0303 (5.0462) Loss: 6.0303 (5.0462)
2024-09-06,23:58:48 | INFO | Train Epoch: 2 [2052096/2572288 (80%)] Data (t): 0.568 Batch (t): 2.618, 1564.29/s, 782.143/s/gpu LR: 0.000269 Logit Scale: 15.784 Contrastive_loss: 6.0015 (5.2054) Loss: 6.0015 (5.2054)
2024-09-07,00:03:10 | INFO | Train Epoch: 2 [2461696/2572288 (96%)] Data (t): 0.573 Batch (t): 2.624, 1562.22/s, 781.111/s/gpu LR: 0.000239 Logit Scale: 16.079 Contrastive_loss: 3.7522 (4.9978) Loss: 3.7522 (4.9978)
2024-09-07,00:04:21 | INFO | Train Epoch: 2 [2572288/2572288 (100%)] Data (t): 0.569 Batch (t): 2.618, 1581.00/s, 790.499/s/gpu LR: 0.000231 Logit Scale: 16.160 Contrastive_loss: 3.6242 (4.8261) Loss: 3.6242 (4.8261)
2024-09-07,00:04:24 | INFO | Start epoch 3
2024-09-07,00:04:35 | INFO | Train Epoch: 3 [   4096/2572288 (0%)] Data (t): 9.688 Batch (t): 11.733, 349.111/s, 174.556/s/gpu LR: 0.000231 Logit Scale: 16.162 Contrastive_loss: 1.8643 (1.8643) Loss: 1.8643 (1.8643)
2024-09-07,00:08:56 | INFO | Train Epoch: 3 [ 413696/2572288 (16%)] Data (t): 0.545 Batch (t): 2.604, 1559.72/s, 779.861/s/gpu LR: 0.000202 Logit Scale: 16.406 Contrastive_loss: 4.5089 (3.1866) Loss: 4.5089 (3.1866)
2024-09-07,00:13:18 | INFO | Train Epoch: 3 [ 823296/2572288 (32%)] Data (t): 0.566 Batch (t): 2.618, 1566.12/s, 783.061/s/gpu LR: 0.000173 Logit Scale: 16.650 Contrastive_loss: 2.5566 (2.9766) Loss: 2.5566 (2.9766)
2024-09-07,00:17:40 | INFO | Train Epoch: 3 [1232896/2572288 (48%)] Data (t): 0.568 Batch (t): 2.619, 1562.05/s, 781.024/s/gpu LR: 0.000145 Logit Scale: 16.830 Contrastive_loss: 3.5824 (3.1280) Loss: 3.5824 (3.1280)
2024-09-07,00:22:01 | INFO | Train Epoch: 3 [1642496/2572288 (64%)] Data (t): 0.566 Batch (t): 2.618, 1560.91/s, 780.456/s/gpu LR: 0.000119 Logit Scale: 16.987 Contrastive_loss: 3.5628 (3.2150) Loss: 3.5628 (3.2150)
2024-09-07,00:26:23 | INFO | Train Epoch: 3 [2052096/2572288 (80%)] Data (t): 0.569 Batch (t): 2.620, 1568.81/s, 784.403/s/gpu LR: 0.000095 Logit Scale: 17.164 Contrastive_loss: 2.0013 (3.0127) Loss: 2.0013 (3.0127)
2024-09-07,00:30:45 | INFO | Train Epoch: 3 [2461696/2572288 (96%)] Data (t): 0.567 Batch (t): 2.619, 1560.00/s, 779.999/s/gpu LR: 0.000072 Logit Scale: 17.312 Contrastive_loss: 2.4090 (2.9265) Loss: 2.4090 (2.9265)
2024-09-07,00:31:56 | INFO | Train Epoch: 3 [2572288/2572288 (100%)] Data (t): 0.563 Batch (t): 2.616, 1573.45/s, 786.724/s/gpu LR: 0.000067 Logit Scale: 17.344 Contrastive_loss: 2.9960 (2.9352) Loss: 2.9960 (2.9352)
2024-09-07,00:31:59 | INFO | Start epoch 4
2024-09-07,00:32:11 | INFO | Train Epoch: 4 [   4096/2572288 (0%)] Data (t): 9.713 Batch (t): 11.759, 348.328/s, 174.164/s/gpu LR: 0.000067 Logit Scale: 17.345 Contrastive_loss: 1.7169 (1.7169) Loss: 1.7169 (1.7169)
2024-09-07,00:36:31 | INFO | Train Epoch: 4 [ 413696/2572288 (16%)] Data (t): 0.543 Batch (t): 2.602, 1569.60/s, 784.799/s/gpu LR: 0.000048 Logit Scale: 17.440 Contrastive_loss: 1.7550 (1.7359) Loss: 1.7550 (1.7359)
2024-09-07,00:40:52 | INFO | Train Epoch: 4 [ 823296/2572288 (32%)] Data (t): 0.563 Batch (t): 2.615, 1566.77/s, 783.385/s/gpu LR: 0.000032 Logit Scale: 17.498 Contrastive_loss: 3.3456 (2.2725) Loss: 3.3456 (2.2725)
2024-09-07,00:45:14 | INFO | Train Epoch: 4 [1232896/2572288 (48%)] Data (t): 0.567 Batch (t): 2.618, 1564.76/s, 782.378/s/gpu LR: 0.000019 Logit Scale: 17.536 Contrastive_loss: 1.5279 (2.0863) Loss: 1.5279 (2.0863)
2024-09-07,00:49:36 | INFO | Train Epoch: 4 [1642496/2572288 (64%)] Data (t): 0.565 Batch (t): 2.617, 1562.39/s, 781.195/s/gpu LR: 0.000009 Logit Scale: 17.557 Contrastive_loss: 1.9162 (2.0523) Loss: 1.9162 (2.0523)
2024-09-07,00:53:57 | INFO | Train Epoch: 4 [2052096/2572288 (80%)] Data (t): 0.565 Batch (t): 2.617, 1566.97/s, 783.487/s/gpu LR: 0.000003 Logit Scale: 17.566 Contrastive_loss: 1.6844 (1.9910) Loss: 1.6844 (1.9910)
2024-09-07,00:58:19 | INFO | Train Epoch: 4 [2461696/2572288 (96%)] Data (t): 0.563 Batch (t): 2.614, 1561.38/s, 780.692/s/gpu LR: 0.000000 Logit Scale: 17.568 Contrastive_loss: 1.5111 (1.9224) Loss: 1.5111 (1.9224)
2024-09-07,00:59:29 | INFO | Train Epoch: 4 [2572288/2572288 (100%)] Data (t): 0.559 Batch (t): 2.610, 1575.70/s, 787.852/s/gpu LR: 0.000000 Logit Scale: 17.568 Contrastive_loss: 1.7148 (1.8965) Loss: 1.7148 (1.8965)