HanSolo9682 commited on
Commit
09a501b
·
1 Parent(s): 5b34380

upload clip model

Browse files
Files changed (3) hide show
  1. checkpoints/epoch_5.pt +3 -0
  2. out.log +167 -0
  3. params.txt +96 -0
checkpoints/epoch_5.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1b9dc2e7c7505183c19352be2fabb78980bd337c510b029e2528c33ae84e3555
3
+ size 1815697682
out.log ADDED
@@ -0,0 +1,167 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-02-12,02:49:55 | INFO | Running with a single process. Device cuda:0.
2
+ 2024-02-12,02:49:55 | INFO | Loaded ViT-B-32 model config.
3
+ 2024-02-12,02:49:58 | INFO | Loading pretrained ViT-B-32 weights (laion2b_s34b_b79k).
4
+ 2024-02-12,02:49:58 | INFO | Model:
5
+ 2024-02-12,02:49:58 | INFO | CLIP(
6
+ (visual): VisionTransformer(
7
+ (conv1): Conv2d(3, 768, kernel_size=(32, 32), stride=(32, 32), bias=False)
8
+ (patch_dropout): Identity()
9
+ (ln_pre): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
10
+ (transformer): Transformer(
11
+ (resblocks): ModuleList(
12
+ (0-11): 12 x ResidualAttentionBlock(
13
+ (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
14
+ (attn): MultiheadAttention(
15
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
16
+ )
17
+ (ls_1): Identity()
18
+ (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
19
+ (mlp): Sequential(
20
+ (c_fc): Linear(in_features=768, out_features=3072, bias=True)
21
+ (gelu): GELU(approximate='none')
22
+ (c_proj): Linear(in_features=3072, out_features=768, bias=True)
23
+ )
24
+ (ls_2): Identity()
25
+ )
26
+ )
27
+ )
28
+ (ln_post): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
29
+ )
30
+ (transformer): Transformer(
31
+ (resblocks): ModuleList(
32
+ (0-11): 12 x ResidualAttentionBlock(
33
+ (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
34
+ (attn): MultiheadAttention(
35
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True)
36
+ )
37
+ (ls_1): Identity()
38
+ (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
39
+ (mlp): Sequential(
40
+ (c_fc): Linear(in_features=512, out_features=2048, bias=True)
41
+ (gelu): GELU(approximate='none')
42
+ (c_proj): Linear(in_features=2048, out_features=512, bias=True)
43
+ )
44
+ (ls_2): Identity()
45
+ )
46
+ )
47
+ )
48
+ (token_embedding): Embedding(49408, 512)
49
+ (ln_final): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
50
+ )
51
+ 2024-02-12,02:49:58 | INFO | Params:
52
+ 2024-02-12,02:49:58 | INFO | accum_freq: 1
53
+ 2024-02-12,02:49:58 | INFO | aug_cfg: {}
54
+ 2024-02-12,02:49:58 | INFO | batch_size: 256
55
+ 2024-02-12,02:49:58 | INFO | beta1: 0.9
56
+ 2024-02-12,02:49:58 | INFO | beta2: 0.98
57
+ 2024-02-12,02:49:58 | INFO | checkpoint_path: ./logs/2024_02_12-02_49_55-model_ViT-B-32-lr_1e-05-b_256-j_8-p_amp_bf16/checkpoints
58
+ 2024-02-12,02:49:58 | INFO | coca_caption_loss_weight: 2.0
59
+ 2024-02-12,02:49:58 | INFO | coca_contrastive_loss_weight: 1.0
60
+ 2024-02-12,02:49:58 | INFO | copy_codebase: False
61
+ 2024-02-12,02:49:58 | INFO | csv_caption_key: captions
62
+ 2024-02-12,02:49:58 | INFO | csv_img_key: images
63
+ 2024-02-12,02:49:58 | INFO | csv_separator:
64
+ 2024-02-12,02:49:58 | INFO | dataset_resampled: False
65
+ 2024-02-12,02:49:58 | INFO | dataset_type: auto
66
+ 2024-02-12,02:49:58 | INFO | ddp_static_graph: True
67
+ 2024-02-12,02:49:58 | INFO | debug: False
68
+ 2024-02-12,02:49:58 | INFO | delete_previous_checkpoint: False
69
+ 2024-02-12,02:49:58 | INFO | device: cuda:0
70
+ 2024-02-12,02:49:58 | INFO | dist_backend: nccl
71
+ 2024-02-12,02:49:58 | INFO | dist_url: env://
72
+ 2024-02-12,02:49:58 | INFO | distill: False
73
+ 2024-02-12,02:49:58 | INFO | distill_model: None
74
+ 2024-02-12,02:49:58 | INFO | distill_pretrained: None
75
+ 2024-02-12,02:49:58 | INFO | distributed: False
76
+ 2024-02-12,02:49:58 | INFO | epochs: 5
77
+ 2024-02-12,02:49:58 | INFO | epochs_cooldown: None
78
+ 2024-02-12,02:49:58 | INFO | eps: 1e-06
79
+ 2024-02-12,02:49:58 | INFO | force_custom_text: False
80
+ 2024-02-12,02:49:58 | INFO | force_image_size: None
81
+ 2024-02-12,02:49:58 | INFO | force_patch_dropout: None
82
+ 2024-02-12,02:49:58 | INFO | force_quick_gelu: False
83
+ 2024-02-12,02:49:58 | INFO | gather_with_grad: True
84
+ 2024-02-12,02:49:58 | INFO | grad_checkpointing: False
85
+ 2024-02-12,02:49:58 | INFO | grad_clip_norm: None
86
+ 2024-02-12,02:49:58 | INFO | horovod: False
87
+ 2024-02-12,02:49:58 | INFO | image_interpolation: None
88
+ 2024-02-12,02:49:58 | INFO | image_mean: None
89
+ 2024-02-12,02:49:58 | INFO | image_resize_mode: None
90
+ 2024-02-12,02:49:58 | INFO | image_std: None
91
+ 2024-02-12,02:49:58 | INFO | imagenet_v2: None
92
+ 2024-02-12,02:49:58 | INFO | imagenet_val: None
93
+ 2024-02-12,02:49:58 | INFO | local_loss: True
94
+ 2024-02-12,02:49:58 | INFO | local_rank: 0
95
+ 2024-02-12,02:49:58 | INFO | lock_image: False
96
+ 2024-02-12,02:49:58 | INFO | lock_image_freeze_bn_stats: False
97
+ 2024-02-12,02:49:58 | INFO | lock_image_unlocked_groups: 0
98
+ 2024-02-12,02:49:58 | INFO | lock_text: False
99
+ 2024-02-12,02:49:58 | INFO | lock_text_freeze_layer_norm: False
100
+ 2024-02-12,02:49:58 | INFO | lock_text_unlocked_layers: 0
101
+ 2024-02-12,02:49:58 | INFO | log_every_n_steps: 100
102
+ 2024-02-12,02:49:58 | INFO | log_level: 20
103
+ 2024-02-12,02:49:58 | INFO | log_local: False
104
+ 2024-02-12,02:49:58 | INFO | log_path: ./logs/2024_02_12-02_49_55-model_ViT-B-32-lr_1e-05-b_256-j_8-p_amp_bf16/out.log
105
+ 2024-02-12,02:49:58 | INFO | logs: ./logs/
106
+ 2024-02-12,02:49:58 | INFO | lr: 1e-05
107
+ 2024-02-12,02:49:58 | INFO | lr_cooldown_end: 0.0
108
+ 2024-02-12,02:49:58 | INFO | lr_cooldown_power: 1.0
109
+ 2024-02-12,02:49:58 | INFO | lr_scheduler: cosine
110
+ 2024-02-12,02:49:58 | INFO | model: ViT-B-32
111
+ 2024-02-12,02:49:58 | INFO | name: 2024_02_12-02_49_55-model_ViT-B-32-lr_1e-05-b_256-j_8-p_amp_bf16
112
+ 2024-02-12,02:49:58 | INFO | no_set_device_rank: False
113
+ 2024-02-12,02:49:58 | INFO | precision: amp_bf16
114
+ 2024-02-12,02:49:58 | INFO | pretrained: laion2b_s34b_b79k
115
+ 2024-02-12,02:49:58 | INFO | pretrained_image: False
116
+ 2024-02-12,02:49:58 | INFO | rank: 0
117
+ 2024-02-12,02:49:58 | INFO | remote_sync: None
118
+ 2024-02-12,02:49:58 | INFO | remote_sync_frequency: 300
119
+ 2024-02-12,02:49:58 | INFO | remote_sync_protocol: s3
120
+ 2024-02-12,02:49:58 | INFO | report_to:
121
+ 2024-02-12,02:49:58 | INFO | resume: None
122
+ 2024-02-12,02:49:58 | INFO | save_frequency: 5
123
+ 2024-02-12,02:49:58 | INFO | save_most_recent: False
124
+ 2024-02-12,02:49:58 | INFO | seed: 0
125
+ 2024-02-12,02:49:58 | INFO | siglip: False
126
+ 2024-02-12,02:49:58 | INFO | skip_scheduler: False
127
+ 2024-02-12,02:49:58 | INFO | tensorboard: False
128
+ 2024-02-12,02:49:58 | INFO | tensorboard_path:
129
+ 2024-02-12,02:49:58 | INFO | torchcompile: False
130
+ 2024-02-12,02:49:58 | INFO | torchscript: False
131
+ 2024-02-12,02:49:58 | INFO | trace: False
132
+ 2024-02-12,02:49:58 | INFO | train_data: ../../train_data_counterfactuals_neg_clip2.csv
133
+ 2024-02-12,02:49:58 | INFO | train_data_upsampling_factors: None
134
+ 2024-02-12,02:49:58 | INFO | train_num_samples: None
135
+ 2024-02-12,02:49:58 | INFO | use_bn_sync: False
136
+ 2024-02-12,02:49:58 | INFO | use_bnb_linear: None
137
+ 2024-02-12,02:49:58 | INFO | val_data: None
138
+ 2024-02-12,02:49:58 | INFO | val_frequency: 5
139
+ 2024-02-12,02:49:58 | INFO | val_num_samples: None
140
+ 2024-02-12,02:49:58 | INFO | wandb: False
141
+ 2024-02-12,02:49:58 | INFO | wandb_notes:
142
+ 2024-02-12,02:49:58 | INFO | wandb_project_name: open-clip
143
+ 2024-02-12,02:49:58 | INFO | warmup: 1024
144
+ 2024-02-12,02:49:58 | INFO | wd: 0.2
145
+ 2024-02-12,02:49:58 | INFO | workers: 8
146
+ 2024-02-12,02:49:58 | INFO | world_size: 1
147
+ 2024-02-12,02:49:58 | INFO | zeroshot_frequency: 5
148
+ 2024-02-12,02:49:58 | INFO | Start epoch 0
149
+ 2024-02-12,02:50:15 | INFO | Train Epoch: 0 [ 1024/27087 (1%)] Data (t): 12.525 Batch (t): 16.592, 15.4295/s, 15.4295/s/gpu LR: 0.000000 Logit Scale: 100.000 Contrastive_loss: 1.0551 (1.0551) Loss: 1.0551 (1.0551)
150
+ 2024-02-12,02:52:13 | INFO | Train Epoch: 0 [103424/27087 (96%)] Data (t): 0.645 Batch (t): 1.175, 459.500/s, 459.500/s/gpu LR: 0.000001 Logit Scale: 99.996 Contrastive_loss: 0.80440 (0.92975) Loss: 0.80440 (0.92975)
151
+ 2024-02-12,02:52:20 | INFO | Train Epoch: 0 [107520/27087 (100%)] Data (t): 1.439 Batch (t): 1.884, 43.6989/s, 43.6989/s/gpu LR: 0.000001 Logit Scale: 99.996 Contrastive_loss: 0.73623 (0.86524) Loss: 0.73623 (0.86524)
152
+ 2024-02-12,02:52:21 | INFO | Start epoch 1
153
+ 2024-02-12,02:52:33 | INFO | Train Epoch: 1 [ 1024/27087 (1%)] Data (t): 11.817 Batch (t): 12.154, 21.0639/s, 21.0639/s/gpu LR: 0.000001 Logit Scale: 99.995 Contrastive_loss: 0.75390 (0.75390) Loss: 0.75390 (0.75390)
154
+ 2024-02-12,02:54:37 | INFO | Train Epoch: 1 [103424/27087 (96%)] Data (t): 0.740 Batch (t): 1.238, 460.135/s, 460.135/s/gpu LR: 0.000002 Logit Scale: 99.988 Contrastive_loss: 0.65958 (0.70674) Loss: 0.65958 (0.70674)
155
+ 2024-02-12,02:54:39 | INFO | Train Epoch: 1 [107520/27087 (100%)] Data (t): 0.058 Batch (t): 0.557, 459.304/s, 459.304/s/gpu LR: 0.000002 Logit Scale: 99.988 Contrastive_loss: 0.64635 (0.68661) Loss: 0.64635 (0.68661)
156
+ 2024-02-12,02:54:39 | INFO | Start epoch 2
157
+ 2024-02-12,02:54:51 | INFO | Train Epoch: 2 [ 1024/27087 (1%)] Data (t): 11.166 Batch (t): 11.505, 22.2512/s, 22.2512/s/gpu LR: 0.000002 Logit Scale: 99.988 Contrastive_loss: 0.53999 (0.53999) Loss: 0.53999 (0.53999)
158
+ 2024-02-12,02:56:51 | INFO | Train Epoch: 2 [103424/27087 (96%)] Data (t): 0.696 Batch (t): 1.195, 459.292/s, 459.292/s/gpu LR: 0.000003 Logit Scale: 99.983 Contrastive_loss: 0.56759 (0.55379) Loss: 0.56759 (0.55379)
159
+ 2024-02-12,02:56:54 | INFO | Train Epoch: 2 [107520/27087 (100%)] Data (t): 0.387 Batch (t): 0.888, 457.597/s, 457.597/s/gpu LR: 0.000003 Logit Scale: 99.983 Contrastive_loss: 0.48756 (0.53171) Loss: 0.48756 (0.53171)
160
+ 2024-02-12,02:56:55 | INFO | Start epoch 3
161
+ 2024-02-12,02:57:07 | INFO | Train Epoch: 3 [ 1024/27087 (1%)] Data (t): 11.677 Batch (t): 12.022, 21.2941/s, 21.2941/s/gpu LR: 0.000003 Logit Scale: 99.983 Contrastive_loss: 0.44987 (0.44987) Loss: 0.44987 (0.44987)
162
+ 2024-02-12,02:59:10 | INFO | Train Epoch: 3 [103424/27087 (96%)] Data (t): 0.718 Batch (t): 1.230, 459.886/s, 459.886/s/gpu LR: 0.000004 Logit Scale: 99.981 Contrastive_loss: 0.42789 (0.43888) Loss: 0.42789 (0.43888)
163
+ 2024-02-12,02:59:12 | INFO | Train Epoch: 3 [107520/27087 (100%)] Data (t): 0.058 Batch (t): 0.558, 459.170/s, 459.170/s/gpu LR: 0.000004 Logit Scale: 99.980 Contrastive_loss: 0.42664 (0.43480) Loss: 0.42664 (0.43480)
164
+ 2024-02-12,02:59:12 | INFO | Start epoch 4
165
+ 2024-02-12,02:59:24 | INFO | Train Epoch: 4 [ 1024/27087 (1%)] Data (t): 11.325 Batch (t): 11.659, 21.9575/s, 21.9575/s/gpu LR: 0.000004 Logit Scale: 99.980 Contrastive_loss: 0.34311 (0.34311) Loss: 0.34311 (0.34311)
166
+ 2024-02-12,03:01:24 | INFO | Train Epoch: 4 [103424/27087 (96%)] Data (t): 0.712 Batch (t): 1.198, 459.840/s, 459.840/s/gpu LR: 0.000005 Logit Scale: 99.989 Contrastive_loss: 0.32785 (0.33548) Loss: 0.32785 (0.33548)
167
+ 2024-02-12,03:01:27 | INFO | Train Epoch: 4 [107520/27087 (100%)] Data (t): 0.180 Batch (t): 0.623, 313.004/s, 313.004/s/gpu LR: 0.000005 Logit Scale: 99.989 Contrastive_loss: 0.36298 (0.34464) Loss: 0.36298 (0.34464)
params.txt ADDED
@@ -0,0 +1,96 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ accum_freq: 1
2
+ aug_cfg: {}
3
+ batch_size: 256
4
+ beta1: 0.9
5
+ beta2: 0.98
6
+ checkpoint_path: ./logs/2024_02_12-02_49_55-model_ViT-B-32-lr_1e-05-b_256-j_8-p_amp_bf16/checkpoints
7
+ coca_caption_loss_weight: 2.0
8
+ coca_contrastive_loss_weight: 1.0
9
+ copy_codebase: False
10
+ csv_caption_key: captions
11
+ csv_img_key: images
12
+ csv_separator:
13
+ dataset_resampled: False
14
+ dataset_type: auto
15
+ ddp_static_graph: True
16
+ debug: False
17
+ delete_previous_checkpoint: False
18
+ device: cuda:0
19
+ dist_backend: nccl
20
+ dist_url: env://
21
+ distill: False
22
+ distill_model: None
23
+ distill_pretrained: None
24
+ distributed: False
25
+ epochs: 5
26
+ epochs_cooldown: None
27
+ eps: 1e-06
28
+ force_custom_text: False
29
+ force_image_size: None
30
+ force_patch_dropout: None
31
+ force_quick_gelu: False
32
+ gather_with_grad: True
33
+ grad_checkpointing: False
34
+ grad_clip_norm: None
35
+ horovod: False
36
+ image_interpolation: None
37
+ image_mean: None
38
+ image_resize_mode: None
39
+ image_std: None
40
+ imagenet_v2: None
41
+ imagenet_val: None
42
+ local_loss: True
43
+ local_rank: 0
44
+ lock_image: False
45
+ lock_image_freeze_bn_stats: False
46
+ lock_image_unlocked_groups: 0
47
+ lock_text: False
48
+ lock_text_freeze_layer_norm: False
49
+ lock_text_unlocked_layers: 0
50
+ log_every_n_steps: 100
51
+ log_level: 20
52
+ log_local: False
53
+ log_path: ./logs/2024_02_12-02_49_55-model_ViT-B-32-lr_1e-05-b_256-j_8-p_amp_bf16/out.log
54
+ logs: ./logs/
55
+ lr: 1e-05
56
+ lr_cooldown_end: 0.0
57
+ lr_cooldown_power: 1.0
58
+ lr_scheduler: cosine
59
+ model: ViT-B-32
60
+ name: 2024_02_12-02_49_55-model_ViT-B-32-lr_1e-05-b_256-j_8-p_amp_bf16
61
+ no_set_device_rank: False
62
+ precision: amp_bf16
63
+ pretrained: laion2b_s34b_b79k
64
+ pretrained_image: False
65
+ rank: 0
66
+ remote_sync: None
67
+ remote_sync_frequency: 300
68
+ remote_sync_protocol: s3
69
+ report_to:
70
+ resume: None
71
+ save_frequency: 5
72
+ save_most_recent: False
73
+ seed: 0
74
+ siglip: False
75
+ skip_scheduler: False
76
+ tensorboard: False
77
+ tensorboard_path:
78
+ torchcompile: False
79
+ torchscript: False
80
+ trace: False
81
+ train_data: ../../train_data_counterfactuals_neg_clip2.csv
82
+ train_data_upsampling_factors: None
83
+ train_num_samples: None
84
+ use_bn_sync: False
85
+ use_bnb_linear: None
86
+ val_data: None
87
+ val_frequency: 5
88
+ val_num_samples: None
89
+ wandb: False
90
+ wandb_notes:
91
+ wandb_project_name: open-clip
92
+ warmup: 1024
93
+ wd: 0.2
94
+ workers: 8
95
+ world_size: 1
96
+ zeroshot_frequency: 5