CounterCurate
/

best-sugarcrepe-clip-countercurate

Model card Files Files and versions Community

HanSolo9682 commited on Mar 24, 2024

Commit

09a501b

1 Parent(s): 5b34380

upload clip model

Browse files

Files changed (3) hide show

checkpoints/epoch_5.pt +3 -0
out.log +167 -0
params.txt +96 -0

checkpoints/epoch_5.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1b9dc2e7c7505183c19352be2fabb78980bd337c510b029e2528c33ae84e3555
+size 1815697682

out.log ADDED Viewed

	@@ -0,0 +1,167 @@

+2024-02-12,02:49:55 | INFO | Running with a single process. Device cuda:0.
+2024-02-12,02:49:55 | INFO | Loaded ViT-B-32 model config.
+2024-02-12,02:49:58 | INFO | Loading pretrained ViT-B-32 weights (laion2b_s34b_b79k).
+2024-02-12,02:49:58 | INFO | Model:
+2024-02-12,02:49:58 | INFO | CLIP(
+  (visual): VisionTransformer(
+    (conv1): Conv2d(3, 768, kernel_size=(32, 32), stride=(32, 32), bias=False)
+    (patch_dropout): Identity()
+    (ln_pre): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
+    (transformer): Transformer(
+      (resblocks): ModuleList(
+        (0-11): 12 x ResidualAttentionBlock(
+          (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
+          (attn): MultiheadAttention(
+            (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
+          )
+          (ls_1): Identity()
+          (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
+          (mlp): Sequential(
+            (c_fc): Linear(in_features=768, out_features=3072, bias=True)
+            (gelu): GELU(approximate='none')
+            (c_proj): Linear(in_features=3072, out_features=768, bias=True)
+          )
+          (ls_2): Identity()
+        )
+      )
+    )
+    (ln_post): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
+  )
+  (transformer): Transformer(
+    (resblocks): ModuleList(
+      (0-11): 12 x ResidualAttentionBlock(
+        (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
+        (attn): MultiheadAttention(
+          (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True)
+        )
+        (ls_1): Identity()
+        (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
+        (mlp): Sequential(
+          (c_fc): Linear(in_features=512, out_features=2048, bias=True)
+          (gelu): GELU(approximate='none')
+          (c_proj): Linear(in_features=2048, out_features=512, bias=True)
+        )
+        (ls_2): Identity()
+      )
+    )
+  )
+  (token_embedding): Embedding(49408, 512)
+  (ln_final): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
+)
+2024-02-12,02:49:58 | INFO | Params:
+2024-02-12,02:49:58 | INFO |   accum_freq: 1
+2024-02-12,02:49:58 | INFO |   aug_cfg: {}
+2024-02-12,02:49:58 | INFO |   batch_size: 256
+2024-02-12,02:49:58 | INFO |   beta1: 0.9
+2024-02-12,02:49:58 | INFO |   beta2: 0.98
+2024-02-12,02:49:58 | INFO |   checkpoint_path: ./logs/2024_02_12-02_49_55-model_ViT-B-32-lr_1e-05-b_256-j_8-p_amp_bf16/checkpoints
+2024-02-12,02:49:58 | INFO |   coca_caption_loss_weight: 2.0
+2024-02-12,02:49:58 | INFO |   coca_contrastive_loss_weight: 1.0
+2024-02-12,02:49:58 | INFO |   copy_codebase: False
+2024-02-12,02:49:58 | INFO |   csv_caption_key: captions
+2024-02-12,02:49:58 | INFO |   csv_img_key: images
+2024-02-12,02:49:58 | INFO |   csv_separator:
+2024-02-12,02:49:58 | INFO |   dataset_resampled: False
+2024-02-12,02:49:58 | INFO |   dataset_type: auto
+2024-02-12,02:49:58 | INFO |   ddp_static_graph: True
+2024-02-12,02:49:58 | INFO |   debug: False
+2024-02-12,02:49:58 | INFO |   delete_previous_checkpoint: False
+2024-02-12,02:49:58 | INFO |   device: cuda:0
+2024-02-12,02:49:58 | INFO |   dist_backend: nccl
+2024-02-12,02:49:58 | INFO |   dist_url: env://
+2024-02-12,02:49:58 | INFO |   distill: False
+2024-02-12,02:49:58 | INFO |   distill_model: None
+2024-02-12,02:49:58 | INFO |   distill_pretrained: None
+2024-02-12,02:49:58 | INFO |   distributed: False
+2024-02-12,02:49:58 | INFO |   epochs: 5
+2024-02-12,02:49:58 | INFO |   epochs_cooldown: None
+2024-02-12,02:49:58 | INFO |   eps: 1e-06
+2024-02-12,02:49:58 | INFO |   force_custom_text: False
+2024-02-12,02:49:58 | INFO |   force_image_size: None
+2024-02-12,02:49:58 | INFO |   force_patch_dropout: None
+2024-02-12,02:49:58 | INFO |   force_quick_gelu: False
+2024-02-12,02:49:58 | INFO |   gather_with_grad: True
+2024-02-12,02:49:58 | INFO |   grad_checkpointing: False
+2024-02-12,02:49:58 | INFO |   grad_clip_norm: None
+2024-02-12,02:49:58 | INFO |   horovod: False
+2024-02-12,02:49:58 | INFO |   image_interpolation: None
+2024-02-12,02:49:58 | INFO |   image_mean: None
+2024-02-12,02:49:58 | INFO |   image_resize_mode: None
+2024-02-12,02:49:58 | INFO |   image_std: None
+2024-02-12,02:49:58 | INFO |   imagenet_v2: None
+2024-02-12,02:49:58 | INFO |   imagenet_val: None
+2024-02-12,02:49:58 | INFO |   local_loss: True
+2024-02-12,02:49:58 | INFO |   local_rank: 0
+2024-02-12,02:49:58 | INFO |   lock_image: False
+2024-02-12,02:49:58 | INFO |   lock_image_freeze_bn_stats: False
+2024-02-12,02:49:58 | INFO |   lock_image_unlocked_groups: 0
+2024-02-12,02:49:58 | INFO |   lock_text: False
+2024-02-12,02:49:58 | INFO |   lock_text_freeze_layer_norm: False
+2024-02-12,02:49:58 | INFO |   lock_text_unlocked_layers: 0
+2024-02-12,02:49:58 | INFO |   log_every_n_steps: 100
+2024-02-12,02:49:58 | INFO |   log_level: 20
+2024-02-12,02:49:58 | INFO |   log_local: False
+2024-02-12,02:49:58 | INFO |   log_path: ./logs/2024_02_12-02_49_55-model_ViT-B-32-lr_1e-05-b_256-j_8-p_amp_bf16/out.log
+2024-02-12,02:49:58 | INFO |   logs: ./logs/
+2024-02-12,02:49:58 | INFO |   lr: 1e-05
+2024-02-12,02:49:58 | INFO |   lr_cooldown_end: 0.0
+2024-02-12,02:49:58 | INFO |   lr_cooldown_power: 1.0
+2024-02-12,02:49:58 | INFO |   lr_scheduler: cosine
+2024-02-12,02:49:58 | INFO |   model: ViT-B-32
+2024-02-12,02:49:58 | INFO |   name: 2024_02_12-02_49_55-model_ViT-B-32-lr_1e-05-b_256-j_8-p_amp_bf16
+2024-02-12,02:49:58 | INFO |   no_set_device_rank: False
+2024-02-12,02:49:58 | INFO |   precision: amp_bf16
+2024-02-12,02:49:58 | INFO |   pretrained: laion2b_s34b_b79k
+2024-02-12,02:49:58 | INFO |   pretrained_image: False
+2024-02-12,02:49:58 | INFO |   rank: 0
+2024-02-12,02:49:58 | INFO |   remote_sync: None
+2024-02-12,02:49:58 | INFO |   remote_sync_frequency: 300
+2024-02-12,02:49:58 | INFO |   remote_sync_protocol: s3
+2024-02-12,02:49:58 | INFO |   report_to:
+2024-02-12,02:49:58 | INFO |   resume: None
+2024-02-12,02:49:58 | INFO |   save_frequency: 5
+2024-02-12,02:49:58 | INFO |   save_most_recent: False
+2024-02-12,02:49:58 | INFO |   seed: 0
+2024-02-12,02:49:58 | INFO |   siglip: False
+2024-02-12,02:49:58 | INFO |   skip_scheduler: False
+2024-02-12,02:49:58 | INFO |   tensorboard: False
+2024-02-12,02:49:58 | INFO |   tensorboard_path:
+2024-02-12,02:49:58 | INFO |   torchcompile: False
+2024-02-12,02:49:58 | INFO |   torchscript: False
+2024-02-12,02:49:58 | INFO |   trace: False
+2024-02-12,02:49:58 | INFO |   train_data: ../../train_data_counterfactuals_neg_clip2.csv
+2024-02-12,02:49:58 | INFO |   train_data_upsampling_factors: None
+2024-02-12,02:49:58 | INFO |   train_num_samples: None
+2024-02-12,02:49:58 | INFO |   use_bn_sync: False
+2024-02-12,02:49:58 | INFO |   use_bnb_linear: None
+2024-02-12,02:49:58 | INFO |   val_data: None
+2024-02-12,02:49:58 | INFO |   val_frequency: 5
+2024-02-12,02:49:58 | INFO |   val_num_samples: None
+2024-02-12,02:49:58 | INFO |   wandb: False
+2024-02-12,02:49:58 | INFO |   wandb_notes:
+2024-02-12,02:49:58 | INFO |   wandb_project_name: open-clip
+2024-02-12,02:49:58 | INFO |   warmup: 1024
+2024-02-12,02:49:58 | INFO |   wd: 0.2
+2024-02-12,02:49:58 | INFO |   workers: 8
+2024-02-12,02:49:58 | INFO |   world_size: 1
+2024-02-12,02:49:58 | INFO |   zeroshot_frequency: 5
+2024-02-12,02:49:58 | INFO | Start epoch 0
+2024-02-12,02:50:15 | INFO | Train Epoch: 0 [ 1024/27087 (1%)] Data (t): 12.525 Batch (t): 16.592, 15.4295/s, 15.4295/s/gpu LR: 0.000000 Logit Scale: 100.000 Contrastive_loss: 1.0551 (1.0551) Loss: 1.0551 (1.0551)
+2024-02-12,02:52:13 | INFO | Train Epoch: 0 [103424/27087 (96%)] Data (t): 0.645 Batch (t): 1.175, 459.500/s, 459.500/s/gpu LR: 0.000001 Logit Scale: 99.996 Contrastive_loss: 0.80440 (0.92975) Loss: 0.80440 (0.92975)
+2024-02-12,02:52:20 | INFO | Train Epoch: 0 [107520/27087 (100%)] Data (t): 1.439 Batch (t): 1.884, 43.6989/s, 43.6989/s/gpu LR: 0.000001 Logit Scale: 99.996 Contrastive_loss: 0.73623 (0.86524) Loss: 0.73623 (0.86524)
+2024-02-12,02:52:21 | INFO | Start epoch 1
+2024-02-12,02:52:33 | INFO | Train Epoch: 1 [ 1024/27087 (1%)] Data (t): 11.817 Batch (t): 12.154, 21.0639/s, 21.0639/s/gpu LR: 0.000001 Logit Scale: 99.995 Contrastive_loss: 0.75390 (0.75390) Loss: 0.75390 (0.75390)
+2024-02-12,02:54:37 | INFO | Train Epoch: 1 [103424/27087 (96%)] Data (t): 0.740 Batch (t): 1.238, 460.135/s, 460.135/s/gpu LR: 0.000002 Logit Scale: 99.988 Contrastive_loss: 0.65958 (0.70674) Loss: 0.65958 (0.70674)
+2024-02-12,02:54:39 | INFO | Train Epoch: 1 [107520/27087 (100%)] Data (t): 0.058 Batch (t): 0.557, 459.304/s, 459.304/s/gpu LR: 0.000002 Logit Scale: 99.988 Contrastive_loss: 0.64635 (0.68661) Loss: 0.64635 (0.68661)
+2024-02-12,02:54:39 | INFO | Start epoch 2
+2024-02-12,02:54:51 | INFO | Train Epoch: 2 [ 1024/27087 (1%)] Data (t): 11.166 Batch (t): 11.505, 22.2512/s, 22.2512/s/gpu LR: 0.000002 Logit Scale: 99.988 Contrastive_loss: 0.53999 (0.53999) Loss: 0.53999 (0.53999)
+2024-02-12,02:56:51 | INFO | Train Epoch: 2 [103424/27087 (96%)] Data (t): 0.696 Batch (t): 1.195, 459.292/s, 459.292/s/gpu LR: 0.000003 Logit Scale: 99.983 Contrastive_loss: 0.56759 (0.55379) Loss: 0.56759 (0.55379)
+2024-02-12,02:56:54 | INFO | Train Epoch: 2 [107520/27087 (100%)] Data (t): 0.387 Batch (t): 0.888, 457.597/s, 457.597/s/gpu LR: 0.000003 Logit Scale: 99.983 Contrastive_loss: 0.48756 (0.53171) Loss: 0.48756 (0.53171)
+2024-02-12,02:56:55 | INFO | Start epoch 3
+2024-02-12,02:57:07 | INFO | Train Epoch: 3 [ 1024/27087 (1%)] Data (t): 11.677 Batch (t): 12.022, 21.2941/s, 21.2941/s/gpu LR: 0.000003 Logit Scale: 99.983 Contrastive_loss: 0.44987 (0.44987) Loss: 0.44987 (0.44987)
+2024-02-12,02:59:10 | INFO | Train Epoch: 3 [103424/27087 (96%)] Data (t): 0.718 Batch (t): 1.230, 459.886/s, 459.886/s/gpu LR: 0.000004 Logit Scale: 99.981 Contrastive_loss: 0.42789 (0.43888) Loss: 0.42789 (0.43888)
+2024-02-12,02:59:12 | INFO | Train Epoch: 3 [107520/27087 (100%)] Data (t): 0.058 Batch (t): 0.558, 459.170/s, 459.170/s/gpu LR: 0.000004 Logit Scale: 99.980 Contrastive_loss: 0.42664 (0.43480) Loss: 0.42664 (0.43480)
+2024-02-12,02:59:12 | INFO | Start epoch 4
+2024-02-12,02:59:24 | INFO | Train Epoch: 4 [ 1024/27087 (1%)] Data (t): 11.325 Batch (t): 11.659, 21.9575/s, 21.9575/s/gpu LR: 0.000004 Logit Scale: 99.980 Contrastive_loss: 0.34311 (0.34311) Loss: 0.34311 (0.34311)
+2024-02-12,03:01:24 | INFO | Train Epoch: 4 [103424/27087 (96%)] Data (t): 0.712 Batch (t): 1.198, 459.840/s, 459.840/s/gpu LR: 0.000005 Logit Scale: 99.989 Contrastive_loss: 0.32785 (0.33548) Loss: 0.32785 (0.33548)
+2024-02-12,03:01:27 | INFO | Train Epoch: 4 [107520/27087 (100%)] Data (t): 0.180 Batch (t): 0.623, 313.004/s, 313.004/s/gpu LR: 0.000005 Logit Scale: 99.989 Contrastive_loss: 0.36298 (0.34464) Loss: 0.36298 (0.34464)

params.txt ADDED Viewed

	@@ -0,0 +1,96 @@

+accum_freq: 1
+aug_cfg: {}
+batch_size: 256
+beta1: 0.9
+beta2: 0.98
+checkpoint_path: ./logs/2024_02_12-02_49_55-model_ViT-B-32-lr_1e-05-b_256-j_8-p_amp_bf16/checkpoints
+coca_caption_loss_weight: 2.0
+coca_contrastive_loss_weight: 1.0
+copy_codebase: False
+csv_caption_key: captions
+csv_img_key: images
+csv_separator:
+dataset_resampled: False
+dataset_type: auto
+ddp_static_graph: True
+debug: False
+delete_previous_checkpoint: False
+device: cuda:0
+dist_backend: nccl
+dist_url: env://
+distill: False
+distill_model: None
+distill_pretrained: None
+distributed: False
+epochs: 5
+epochs_cooldown: None
+eps: 1e-06
+force_custom_text: False
+force_image_size: None
+force_patch_dropout: None
+force_quick_gelu: False
+gather_with_grad: True
+grad_checkpointing: False
+grad_clip_norm: None
+horovod: False
+image_interpolation: None
+image_mean: None
+image_resize_mode: None
+image_std: None
+imagenet_v2: None
+imagenet_val: None
+local_loss: True
+local_rank: 0
+lock_image: False
+lock_image_freeze_bn_stats: False
+lock_image_unlocked_groups: 0
+lock_text: False
+lock_text_freeze_layer_norm: False
+lock_text_unlocked_layers: 0
+log_every_n_steps: 100
+log_level: 20
+log_local: False
+log_path: ./logs/2024_02_12-02_49_55-model_ViT-B-32-lr_1e-05-b_256-j_8-p_amp_bf16/out.log
+logs: ./logs/
+lr: 1e-05
+lr_cooldown_end: 0.0
+lr_cooldown_power: 1.0
+lr_scheduler: cosine
+model: ViT-B-32
+name: 2024_02_12-02_49_55-model_ViT-B-32-lr_1e-05-b_256-j_8-p_amp_bf16
+no_set_device_rank: False
+precision: amp_bf16
+pretrained: laion2b_s34b_b79k
+pretrained_image: False
+rank: 0
+remote_sync: None
+remote_sync_frequency: 300
+remote_sync_protocol: s3
+report_to:
+resume: None
+save_frequency: 5
+save_most_recent: False
+seed: 0
+siglip: False
+skip_scheduler: False
+tensorboard: False
+tensorboard_path:
+torchcompile: False
+torchscript: False
+trace: False
+train_data: ../../train_data_counterfactuals_neg_clip2.csv
+train_data_upsampling_factors: None
+train_num_samples: None
+use_bn_sync: False
+use_bnb_linear: None
+val_data: None
+val_frequency: 5
+val_num_samples: None
+wandb: False
+wandb_notes:
+wandb_project_name: open-clip
+warmup: 1024
+wd: 0.2
+workers: 8
+world_size: 1
+zeroshot_frequency: 5