2024-09-06,22:41:22 | INFO | No latest resume checkpoint found in /home/breaking_0.1_trained/10_most_difficult/checkpoints. 2024-09-06,22:41:24 | INFO | Running in distributed mode with multiple processes. Device: cuda:0.Process (global: 0, local 0), total 2. 2024-09-06,22:41:24 | INFO | Loaded ViT-B-32 model config. 2024-09-06,22:41:25 | INFO | Model: 2024-09-06,22:41:25 | INFO | CLIP( (visual): VisionTransformer( (patchnorm_pre_ln): Identity() (conv1): Conv2d(3, 768, kernel_size=(32, 32), stride=(32, 32), bias=False) (patch_dropout): Identity() (ln_pre): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (transformer): Transformer( (resblocks): ModuleList( (0): ResidualAttentionBlock( (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) ) (ls_1): Identity() (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=768, out_features=3072, bias=True) (gelu): GELU(approximate='none') (c_proj): Linear(in_features=3072, out_features=768, bias=True) ) (ls_2): Identity() ) (1): ResidualAttentionBlock( (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) ) (ls_1): Identity() (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=768, out_features=3072, bias=True) (gelu): GELU(approximate='none') (c_proj): Linear(in_features=3072, out_features=768, bias=True) ) (ls_2): Identity() ) (2): ResidualAttentionBlock( (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) ) (ls_1): Identity() (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=768, out_features=3072, bias=True) (gelu): GELU(approximate='none') (c_proj): Linear(in_features=3072, out_features=768, bias=True) ) (ls_2): Identity() ) (3): ResidualAttentionBlock( (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) ) (ls_1): Identity() (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=768, out_features=3072, bias=True) (gelu): GELU(approximate='none') (c_proj): Linear(in_features=3072, out_features=768, bias=True) ) (ls_2): Identity() ) (4): ResidualAttentionBlock( (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) ) (ls_1): Identity() (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=768, out_features=3072, bias=True) (gelu): GELU(approximate='none') (c_proj): Linear(in_features=3072, out_features=768, bias=True) ) (ls_2): Identity() ) (5): ResidualAttentionBlock( (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) ) (ls_1): Identity() (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=768, out_features=3072, bias=True) (gelu): GELU(approximate='none') (c_proj): Linear(in_features=3072, out_features=768, bias=True) ) (ls_2): Identity() ) (6): ResidualAttentionBlock( (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) ) (ls_1): Identity() (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=768, out_features=3072, bias=True) (gelu): GELU(approximate='none') (c_proj): Linear(in_features=3072, out_features=768, bias=True) ) (ls_2): Identity() ) (7): ResidualAttentionBlock( (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) ) (ls_1): Identity() (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=768, out_features=3072, bias=True) (gelu): GELU(approximate='none') (c_proj): Linear(in_features=3072, out_features=768, bias=True) ) (ls_2): Identity() ) (8): ResidualAttentionBlock( (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) ) (ls_1): Identity() (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=768, out_features=3072, bias=True) (gelu): GELU(approximate='none') (c_proj): Linear(in_features=3072, out_features=768, bias=True) ) (ls_2): Identity() ) (9): ResidualAttentionBlock( (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) ) (ls_1): Identity() (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=768, out_features=3072, bias=True) (gelu): GELU(approximate='none') (c_proj): Linear(in_features=3072, out_features=768, bias=True) ) (ls_2): Identity() ) (10): ResidualAttentionBlock( (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) ) (ls_1): Identity() (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=768, out_features=3072, bias=True) (gelu): GELU(approximate='none') (c_proj): Linear(in_features=3072, out_features=768, bias=True) ) (ls_2): Identity() ) (11): ResidualAttentionBlock( (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) ) (ls_1): Identity() (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=768, out_features=3072, bias=True) (gelu): GELU(approximate='none') (c_proj): Linear(in_features=3072, out_features=768, bias=True) ) (ls_2): Identity() ) ) ) (ln_post): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (transformer): Transformer( (resblocks): ModuleList( (0): ResidualAttentionBlock( (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) ) (ls_1): Identity() (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=512, out_features=2048, bias=True) (gelu): GELU(approximate='none') (c_proj): Linear(in_features=2048, out_features=512, bias=True) ) (ls_2): Identity() ) (1): ResidualAttentionBlock( (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) ) (ls_1): Identity() (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=512, out_features=2048, bias=True) (gelu): GELU(approximate='none') (c_proj): Linear(in_features=2048, out_features=512, bias=True) ) (ls_2): Identity() ) (2): ResidualAttentionBlock( (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) ) (ls_1): Identity() (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=512, out_features=2048, bias=True) (gelu): GELU(approximate='none') (c_proj): Linear(in_features=2048, out_features=512, bias=True) ) (ls_2): Identity() ) (3): ResidualAttentionBlock( (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) ) (ls_1): Identity() (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=512, out_features=2048, bias=True) (gelu): GELU(approximate='none') (c_proj): Linear(in_features=2048, out_features=512, bias=True) ) (ls_2): Identity() ) (4): ResidualAttentionBlock( (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) ) (ls_1): Identity() (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=512, out_features=2048, bias=True) (gelu): GELU(approximate='none') (c_proj): Linear(in_features=2048, out_features=512, bias=True) ) (ls_2): Identity() ) (5): ResidualAttentionBlock( (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) ) (ls_1): Identity() (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=512, out_features=2048, bias=True) (gelu): GELU(approximate='none') (c_proj): Linear(in_features=2048, out_features=512, bias=True) ) (ls_2): Identity() ) (6): ResidualAttentionBlock( (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) ) (ls_1): Identity() (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=512, out_features=2048, bias=True) (gelu): GELU(approximate='none') (c_proj): Linear(in_features=2048, out_features=512, bias=True) ) (ls_2): Identity() ) (7): ResidualAttentionBlock( (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) ) (ls_1): Identity() (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=512, out_features=2048, bias=True) (gelu): GELU(approximate='none') (c_proj): Linear(in_features=2048, out_features=512, bias=True) ) (ls_2): Identity() ) (8): ResidualAttentionBlock( (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) ) (ls_1): Identity() (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=512, out_features=2048, bias=True) (gelu): GELU(approximate='none') (c_proj): Linear(in_features=2048, out_features=512, bias=True) ) (ls_2): Identity() ) (9): ResidualAttentionBlock( (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) ) (ls_1): Identity() (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=512, out_features=2048, bias=True) (gelu): GELU(approximate='none') (c_proj): Linear(in_features=2048, out_features=512, bias=True) ) (ls_2): Identity() ) (10): ResidualAttentionBlock( (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) ) (ls_1): Identity() (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=512, out_features=2048, bias=True) (gelu): GELU(approximate='none') (c_proj): Linear(in_features=2048, out_features=512, bias=True) ) (ls_2): Identity() ) (11): ResidualAttentionBlock( (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) ) (ls_1): Identity() (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (mlp): Sequential( (c_fc): Linear(in_features=512, out_features=2048, bias=True) (gelu): GELU(approximate='none') (c_proj): Linear(in_features=2048, out_features=512, bias=True) ) (ls_2): Identity() ) ) ) (token_embedding): Embedding(49408, 512) (ln_final): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) 2024-09-06,22:41:25 | INFO | Params: 2024-09-06,22:41:25 | INFO | accum_freq: 1 2024-09-06,22:41:25 | INFO | aug_cfg: {} 2024-09-06,22:41:25 | INFO | batch_size: 2048 2024-09-06,22:41:25 | INFO | beta1: 0.9 2024-09-06,22:41:25 | INFO | beta2: 0.98 2024-09-06,22:41:25 | INFO | checkpoint_path: /home/breaking_0.1_trained/10_most_difficult/checkpoints 2024-09-06,22:41:25 | INFO | coca_caption_loss_weight: 2.0 2024-09-06,22:41:25 | INFO | coca_contrastive_loss_weight: 1.0 2024-09-06,22:41:25 | INFO | copy_codebase: False 2024-09-06,22:41:25 | INFO | csv_caption_key: title 2024-09-06,22:41:25 | INFO | csv_img_key: filepath 2024-09-06,22:41:25 | INFO | csv_separator: 2024-09-06,22:41:25 | INFO | dataset_resampled: True 2024-09-06,22:41:25 | INFO | dataset_type: webdataset 2024-09-06,22:41:25 | INFO | ddp_static_graph: True 2024-09-06,22:41:25 | INFO | debug: False 2024-09-06,22:41:25 | INFO | delete_previous_checkpoint: False 2024-09-06,22:41:25 | INFO | device: cuda:0 2024-09-06,22:41:25 | INFO | dist_backend: nccl 2024-09-06,22:41:25 | INFO | dist_url: env:// 2024-09-06,22:41:25 | INFO | distill: False 2024-09-06,22:41:25 | INFO | distill_model: None 2024-09-06,22:41:25 | INFO | distill_pretrained: None 2024-09-06,22:41:25 | INFO | distributed: True 2024-09-06,22:41:25 | INFO | epochs: 5 2024-09-06,22:41:25 | INFO | epochs_cooldown: None 2024-09-06,22:41:25 | INFO | eps: 1e-06 2024-09-06,22:41:25 | INFO | force_custom_text: False 2024-09-06,22:41:25 | INFO | force_image_size: None 2024-09-06,22:41:25 | INFO | force_patch_dropout: None 2024-09-06,22:41:25 | INFO | force_quick_gelu: False 2024-09-06,22:41:25 | INFO | gather_with_grad: True 2024-09-06,22:41:25 | INFO | grad_checkpointing: True 2024-09-06,22:41:25 | INFO | grad_clip_norm: None 2024-09-06,22:41:25 | INFO | horovod: False 2024-09-06,22:41:25 | INFO | image_mean: None 2024-09-06,22:41:25 | INFO | image_std: None 2024-09-06,22:41:25 | INFO | imagenet_v2: None 2024-09-06,22:41:25 | INFO | imagenet_val: None 2024-09-06,22:41:25 | INFO | local_loss: True 2024-09-06,22:41:25 | INFO | local_rank: 0 2024-09-06,22:41:25 | INFO | lock_image: False 2024-09-06,22:41:25 | INFO | lock_image_freeze_bn_stats: False 2024-09-06,22:41:25 | INFO | lock_image_unlocked_groups: 0 2024-09-06,22:41:25 | INFO | lock_text: False 2024-09-06,22:41:25 | INFO | lock_text_freeze_layer_norm: False 2024-09-06,22:41:25 | INFO | lock_text_unlocked_layers: 0 2024-09-06,22:41:25 | INFO | log_every_n_steps: 100 2024-09-06,22:41:25 | INFO | log_level: 20 2024-09-06,22:41:25 | INFO | log_local: False 2024-09-06,22:41:25 | INFO | log_path: /home/breaking_0.1_trained/10_most_difficult/out.log 2024-09-06,22:41:25 | INFO | logs: /home/breaking_0.1_trained 2024-09-06,22:41:25 | INFO | lr: 0.0005 2024-09-06,22:41:25 | INFO | lr_cooldown_end: 0.0 2024-09-06,22:41:25 | INFO | lr_cooldown_power: 1.0 2024-09-06,22:41:25 | INFO | lr_scheduler: cosine 2024-09-06,22:41:25 | INFO | model: ViT-B-32 2024-09-06,22:41:25 | INFO | name: 10_most_difficult 2024-09-06,22:41:25 | INFO | no_set_device_rank: False 2024-09-06,22:41:25 | INFO | precision: amp 2024-09-06,22:41:25 | INFO | pretrained: 2024-09-06,22:41:25 | INFO | pretrained_image: False 2024-09-06,22:41:25 | INFO | rank: 0 2024-09-06,22:41:25 | INFO | remote_sync: None 2024-09-06,22:41:25 | INFO | remote_sync_frequency: 300 2024-09-06,22:41:25 | INFO | remote_sync_protocol: s3 2024-09-06,22:41:25 | INFO | report_to: wandb 2024-09-06,22:41:25 | INFO | resume: None 2024-09-06,22:41:25 | INFO | save_frequency: 0 2024-09-06,22:41:25 | INFO | save_most_recent: True 2024-09-06,22:41:25 | INFO | seed: 0 2024-09-06,22:41:25 | INFO | skip_scheduler: False 2024-09-06,22:41:25 | INFO | tensorboard: False 2024-09-06,22:41:25 | INFO | tensorboard_path: 2024-09-06,22:41:25 | INFO | torchscript: False 2024-09-06,22:41:25 | INFO | trace: False 2024-09-06,22:41:25 | INFO | train_data: /home/breaking_0.1/{00000000..00000127}.tar 2024-09-06,22:41:25 | INFO | train_data_upsampling_factors: None 2024-09-06,22:41:25 | INFO | train_num_samples: 2560000 2024-09-06,22:41:25 | INFO | use_bn_sync: False 2024-09-06,22:41:25 | INFO | val_data: None 2024-09-06,22:41:25 | INFO | val_frequency: 1 2024-09-06,22:41:25 | INFO | val_num_samples: None 2024-09-06,22:41:25 | INFO | wandb: True 2024-09-06,22:41:25 | INFO | wandb_notes: 2024-09-06,22:41:25 | INFO | wandb_project_name: clip_text_hq_clusters 2024-09-06,22:41:25 | INFO | warmup: 500 2024-09-06,22:41:25 | INFO | wd: 0.2 2024-09-06,22:41:25 | INFO | workers: 4 2024-09-06,22:41:25 | INFO | world_size: 2 2024-09-06,22:41:25 | INFO | zeroshot_frequency: 2 2024-09-06,22:41:34 | INFO | Start epoch 0 2024-09-06,22:41:51 | INFO | Train Epoch: 0 [ 4096/2572288 (0%)] Data (t): 11.911 Batch (t): 16.649, 246.023/s, 123.011/s/gpu LR: 0.000001 Logit Scale: 14.286 Contrastive_loss: 8.3776 (8.3776) Loss: 8.3776 (8.3776) 2024-09-06,22:41:54 | INFO | Reducer buckets have been rebuilt in this iteration. 2024-09-06,22:46:11 | INFO | Train Epoch: 0 [ 413696/2572288 (16%)] Data (t): 0.555 Batch (t): 2.608, 1572.92/s, 786.459/s/gpu LR: 0.000101 Logit Scale: 14.264 Contrastive_loss: 8.2202 (8.2989) Loss: 8.2202 (8.2989) 2024-09-06,22:50:33 | INFO | Train Epoch: 0 [ 823296/2572288 (32%)] Data (t): 0.568 Batch (t): 2.616, 1572.28/s, 786.140/s/gpu LR: 0.000201 Logit Scale: 14.244 Contrastive_loss: 7.9768 (8.1915) Loss: 7.9768 (8.1915) 2024-09-06,22:54:55 | INFO | Train Epoch: 0 [1232896/2572288 (48%)] Data (t): 0.570 Batch (t): 2.618, 1560.51/s, 780.257/s/gpu LR: 0.000301 Logit Scale: 14.227 Contrastive_loss: 7.9563 (8.1327) Loss: 7.9563 (8.1327) 2024-09-06,22:59:17 | INFO | Train Epoch: 0 [1642496/2572288 (64%)] Data (t): 0.574 Batch (t): 2.623, 1564.50/s, 782.249/s/gpu LR: 0.000401 Logit Scale: 14.205 Contrastive_loss: 7.9317 (8.0925) Loss: 7.9317 (8.0925) 2024-09-06,23:03:39 | INFO | Train Epoch: 0 [2052096/2572288 (80%)] Data (t): 0.571 Batch (t): 2.620, 1561.55/s, 780.776/s/gpu LR: 0.000500 Logit Scale: 14.192 Contrastive_loss: 7.7846 (8.0412) Loss: 7.7846 (8.0412) 2024-09-06,23:08:01 | INFO | Train Epoch: 0 [2461696/2572288 (96%)] Data (t): 0.569 Batch (t): 2.619, 1563.18/s, 781.589/s/gpu LR: 0.000498 Logit Scale: 14.198 Contrastive_loss: 6.8264 (7.8676) Loss: 6.8264 (7.8676) 2024-09-06,23:09:12 | INFO | Train Epoch: 0 [2572288/2572288 (100%)] Data (t): 0.569 Batch (t): 2.618, 1573.70/s, 786.852/s/gpu LR: 0.000497 Logit Scale: 14.199 Contrastive_loss: 7.5857 (7.8324) Loss: 7.5857 (7.8324) 2024-09-06,23:09:14 | INFO | Start epoch 1 2024-09-06,23:09:26 | INFO | Train Epoch: 1 [ 4096/2572288 (0%)] Data (t): 9.742 Batch (t): 11.787, 347.508/s, 173.754/s/gpu LR: 0.000497 Logit Scale: 14.199 Contrastive_loss: 7.8159 (7.8159) Loss: 7.8159 (7.8159) 2024-09-06,23:13:45 | INFO | Train Epoch: 1 [ 413696/2572288 (16%)] Data (t): 0.536 Batch (t): 2.594, 1566.93/s, 783.467/s/gpu LR: 0.000491 Logit Scale: 14.223 Contrastive_loss: 6.1751 (6.9955) Loss: 6.1751 (6.9955) 2024-09-06,23:18:08 | INFO | Train Epoch: 1 [ 823296/2572288 (32%)] Data (t): 0.571 Batch (t): 2.622, 1565.08/s, 782.541/s/gpu LR: 0.000481 Logit Scale: 14.267 Contrastive_loss: 6.8320 (6.9410) Loss: 6.8320 (6.9410) 2024-09-06,23:22:30 | INFO | Train Epoch: 1 [1232896/2572288 (48%)] Data (t): 0.571 Batch (t): 2.621, 1560.41/s, 780.206/s/gpu LR: 0.000468 Logit Scale: 14.319 Contrastive_loss: 6.7536 (6.8941) Loss: 6.7536 (6.8941) 2024-09-06,23:26:52 | INFO | Train Epoch: 1 [1642496/2572288 (64%)] Data (t): 0.572 Batch (t): 2.625, 1564.15/s, 782.075/s/gpu LR: 0.000452 Logit Scale: 14.415 Contrastive_loss: 7.1712 (6.9495) Loss: 7.1712 (6.9495) 2024-09-06,23:31:14 | INFO | Train Epoch: 1 [2052096/2572288 (80%)] Data (t): 0.567 Batch (t): 2.618, 1564.96/s, 782.479/s/gpu LR: 0.000433 Logit Scale: 14.531 Contrastive_loss: 7.3828 (7.0218) Loss: 7.3828 (7.0218) 2024-09-06,23:35:36 | INFO | Train Epoch: 1 [2461696/2572288 (96%)] Data (t): 0.570 Batch (t): 2.619, 1559.05/s, 779.526/s/gpu LR: 0.000412 Logit Scale: 14.685 Contrastive_loss: 5.3914 (6.7888) Loss: 5.3914 (6.7888) 2024-09-06,23:36:47 | INFO | Train Epoch: 1 [2572288/2572288 (100%)] Data (t): 0.569 Batch (t): 2.618, 1573.02/s, 786.512/s/gpu LR: 0.000406 Logit Scale: 14.750 Contrastive_loss: 6.8457 (6.7959) Loss: 6.8457 (6.7959) 2024-09-06,23:36:49 | INFO | Start epoch 2 2024-09-06,23:37:01 | INFO | Train Epoch: 2 [ 4096/2572288 (0%)] Data (t): 9.683 Batch (t): 11.729, 349.234/s, 174.617/s/gpu LR: 0.000405 Logit Scale: 14.752 Contrastive_loss: 3.8208 (3.8208) Loss: 3.8208 (3.8208) 2024-09-06,23:41:21 | INFO | Train Epoch: 2 [ 413696/2572288 (16%)] Data (t): 0.539 Batch (t): 2.596, 1563.69/s, 781.844/s/gpu LR: 0.000381 Logit Scale: 14.930 Contrastive_loss: 5.3136 (4.5672) Loss: 5.3136 (4.5672) 2024-09-06,23:45:42 | INFO | Train Epoch: 2 [ 823296/2572288 (32%)] Data (t): 0.564 Batch (t): 2.615, 1565.33/s, 782.664/s/gpu LR: 0.000355 Logit Scale: 15.116 Contrastive_loss: 6.5801 (5.2382) Loss: 6.5801 (5.2382) 2024-09-06,23:50:04 | INFO | Train Epoch: 2 [1232896/2572288 (48%)] Data (t): 0.571 Batch (t): 2.621, 1569.26/s, 784.629/s/gpu LR: 0.000327 Logit Scale: 15.352 Contrastive_loss: 3.4862 (4.8002) Loss: 3.4862 (4.8002) 2024-09-06,23:54:26 | INFO | Train Epoch: 2 [1642496/2572288 (64%)] Data (t): 0.569 Batch (t): 2.619, 1571.61/s, 785.806/s/gpu LR: 0.000298 Logit Scale: 15.478 Contrastive_loss: 6.0303 (5.0462) Loss: 6.0303 (5.0462) 2024-09-06,23:58:48 | INFO | Train Epoch: 2 [2052096/2572288 (80%)] Data (t): 0.568 Batch (t): 2.618, 1564.29/s, 782.143/s/gpu LR: 0.000269 Logit Scale: 15.784 Contrastive_loss: 6.0015 (5.2054) Loss: 6.0015 (5.2054) 2024-09-07,00:03:10 | INFO | Train Epoch: 2 [2461696/2572288 (96%)] Data (t): 0.573 Batch (t): 2.624, 1562.22/s, 781.111/s/gpu LR: 0.000239 Logit Scale: 16.079 Contrastive_loss: 3.7522 (4.9978) Loss: 3.7522 (4.9978) 2024-09-07,00:04:21 | INFO | Train Epoch: 2 [2572288/2572288 (100%)] Data (t): 0.569 Batch (t): 2.618, 1581.00/s, 790.499/s/gpu LR: 0.000231 Logit Scale: 16.160 Contrastive_loss: 3.6242 (4.8261) Loss: 3.6242 (4.8261) 2024-09-07,00:04:24 | INFO | Start epoch 3 2024-09-07,00:04:35 | INFO | Train Epoch: 3 [ 4096/2572288 (0%)] Data (t): 9.688 Batch (t): 11.733, 349.111/s, 174.556/s/gpu LR: 0.000231 Logit Scale: 16.162 Contrastive_loss: 1.8643 (1.8643) Loss: 1.8643 (1.8643) 2024-09-07,00:08:56 | INFO | Train Epoch: 3 [ 413696/2572288 (16%)] Data (t): 0.545 Batch (t): 2.604, 1559.72/s, 779.861/s/gpu LR: 0.000202 Logit Scale: 16.406 Contrastive_loss: 4.5089 (3.1866) Loss: 4.5089 (3.1866) 2024-09-07,00:13:18 | INFO | Train Epoch: 3 [ 823296/2572288 (32%)] Data (t): 0.566 Batch (t): 2.618, 1566.12/s, 783.061/s/gpu LR: 0.000173 Logit Scale: 16.650 Contrastive_loss: 2.5566 (2.9766) Loss: 2.5566 (2.9766) 2024-09-07,00:17:40 | INFO | Train Epoch: 3 [1232896/2572288 (48%)] Data (t): 0.568 Batch (t): 2.619, 1562.05/s, 781.024/s/gpu LR: 0.000145 Logit Scale: 16.830 Contrastive_loss: 3.5824 (3.1280) Loss: 3.5824 (3.1280) 2024-09-07,00:22:01 | INFO | Train Epoch: 3 [1642496/2572288 (64%)] Data (t): 0.566 Batch (t): 2.618, 1560.91/s, 780.456/s/gpu LR: 0.000119 Logit Scale: 16.987 Contrastive_loss: 3.5628 (3.2150) Loss: 3.5628 (3.2150) 2024-09-07,00:26:23 | INFO | Train Epoch: 3 [2052096/2572288 (80%)] Data (t): 0.569 Batch (t): 2.620, 1568.81/s, 784.403/s/gpu LR: 0.000095 Logit Scale: 17.164 Contrastive_loss: 2.0013 (3.0127) Loss: 2.0013 (3.0127) 2024-09-07,00:30:45 | INFO | Train Epoch: 3 [2461696/2572288 (96%)] Data (t): 0.567 Batch (t): 2.619, 1560.00/s, 779.999/s/gpu LR: 0.000072 Logit Scale: 17.312 Contrastive_loss: 2.4090 (2.9265) Loss: 2.4090 (2.9265) 2024-09-07,00:31:56 | INFO | Train Epoch: 3 [2572288/2572288 (100%)] Data (t): 0.563 Batch (t): 2.616, 1573.45/s, 786.724/s/gpu LR: 0.000067 Logit Scale: 17.344 Contrastive_loss: 2.9960 (2.9352) Loss: 2.9960 (2.9352) 2024-09-07,00:31:59 | INFO | Start epoch 4 2024-09-07,00:32:11 | INFO | Train Epoch: 4 [ 4096/2572288 (0%)] Data (t): 9.713 Batch (t): 11.759, 348.328/s, 174.164/s/gpu LR: 0.000067 Logit Scale: 17.345 Contrastive_loss: 1.7169 (1.7169) Loss: 1.7169 (1.7169) 2024-09-07,00:36:31 | INFO | Train Epoch: 4 [ 413696/2572288 (16%)] Data (t): 0.543 Batch (t): 2.602, 1569.60/s, 784.799/s/gpu LR: 0.000048 Logit Scale: 17.440 Contrastive_loss: 1.7550 (1.7359) Loss: 1.7550 (1.7359) 2024-09-07,00:40:52 | INFO | Train Epoch: 4 [ 823296/2572288 (32%)] Data (t): 0.563 Batch (t): 2.615, 1566.77/s, 783.385/s/gpu LR: 0.000032 Logit Scale: 17.498 Contrastive_loss: 3.3456 (2.2725) Loss: 3.3456 (2.2725) 2024-09-07,00:45:14 | INFO | Train Epoch: 4 [1232896/2572288 (48%)] Data (t): 0.567 Batch (t): 2.618, 1564.76/s, 782.378/s/gpu LR: 0.000019 Logit Scale: 17.536 Contrastive_loss: 1.5279 (2.0863) Loss: 1.5279 (2.0863) 2024-09-07,00:49:36 | INFO | Train Epoch: 4 [1642496/2572288 (64%)] Data (t): 0.565 Batch (t): 2.617, 1562.39/s, 781.195/s/gpu LR: 0.000009 Logit Scale: 17.557 Contrastive_loss: 1.9162 (2.0523) Loss: 1.9162 (2.0523) 2024-09-07,00:53:57 | INFO | Train Epoch: 4 [2052096/2572288 (80%)] Data (t): 0.565 Batch (t): 2.617, 1566.97/s, 783.487/s/gpu LR: 0.000003 Logit Scale: 17.566 Contrastive_loss: 1.6844 (1.9910) Loss: 1.6844 (1.9910) 2024-09-07,00:58:19 | INFO | Train Epoch: 4 [2461696/2572288 (96%)] Data (t): 0.563 Batch (t): 2.614, 1561.38/s, 780.692/s/gpu LR: 0.000000 Logit Scale: 17.568 Contrastive_loss: 1.5111 (1.9224) Loss: 1.5111 (1.9224) 2024-09-07,00:59:29 | INFO | Train Epoch: 4 [2572288/2572288 (100%)] Data (t): 0.559 Batch (t): 2.610, 1575.70/s, 787.852/s/gpu LR: 0.000000 Logit Scale: 17.568 Contrastive_loss: 1.7148 (1.8965) Loss: 1.7148 (1.8965)