Ross Wightman commited on
Commit
8075f1d
·
1 Parent(s): b6f7d8d

Update README

Browse files
Files changed (1) hide show
  1. README.md +85 -7
README.md CHANGED
@@ -6,11 +6,12 @@ license: mit
6
  # Table of Contents
7
 
8
  1. [Model Details](#model-details)
9
- 1. [Uses](#uses)
10
- 1. [Training Details](#training-details)
11
- 1. [Evaluation](#evaluation)
12
- 1. [Citation](#citation)
13
- 1. [How To Get Started With the Model](#how-to-get-started-with-the-model)
 
14
 
15
 
16
  # Model Details
@@ -19,6 +20,8 @@ license: mit
19
 
20
  A CLIP ViT L/14 model trained with the LAION-2B English subset of LAION-5B (https://laion.ai/blog/laion-5b/) using OpenCLIP (https://github.com/mlfoundations/open_clip).
21
 
 
 
22
  # Uses
23
 
24
  As per the original OpenAI CLIP models, this model is intended as a research output for research communities. We hope that this model will enable researchers to better understand and explore zero-shot, arbitrary image classification. We also hope it can be used for interdisciplinary studies of the potential impact of such model.
@@ -55,7 +58,74 @@ This model was trained with the 2 Billion sample English subset of LAION-5B (htt
55
 
56
  ## Training Procedure
57
 
58
- **TODO** - add SLURM script, hparams.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
59
 
60
  # Evaluation
61
 
@@ -71,7 +141,15 @@ The testing is performed with VTAB+ (A combination of VTAB (https://arxiv.org/ab
71
 
72
  ## Results
73
 
74
- **TODO** - full zero-shot and retrieval benchmark results
 
 
 
 
 
 
 
 
75
 
76
  # Citation
77
 
 
6
  # Table of Contents
7
 
8
  1. [Model Details](#model-details)
9
+ 2. [Uses](#uses)
10
+ 3. [Training Details](#training-details)
11
+ 4. [Evaluation](#evaluation)
12
+ 5. [Acknolwedgements](#acknowledgements)
13
+ 6. [Citation](#citation)
14
+ 7. [How To Get Started With the Model](#how-to-get-started-with-the-model)
15
 
16
 
17
  # Model Details
 
20
 
21
  A CLIP ViT L/14 model trained with the LAION-2B English subset of LAION-5B (https://laion.ai/blog/laion-5b/) using OpenCLIP (https://github.com/mlfoundations/open_clip).
22
 
23
+ Model training ('babysitting') done by Ross Wightman on the [JUWELS Booster](https://apps.fz-juelich.de/jsc/hps/juwels/booster-overview.html) supercomputer. See acknowledgements below.
24
+
25
  # Uses
26
 
27
  As per the original OpenAI CLIP models, this model is intended as a research output for research communities. We hope that this model will enable researchers to better understand and explore zero-shot, arbitrary image classification. We also hope it can be used for interdisciplinary studies of the potential impact of such model.
 
58
 
59
  ## Training Procedure
60
 
61
+ The model was trained on 384 A100 GPUs using 200M sample 'virtual' epochs where dataset shards were sampled with replacement. The model was trained with 160 virtual epochs for a total of 32B samples seen.
62
+
63
+ The first 68 epochs were trained with float16 AMP, global batch size 79K (208 per GPU). Initially running to epoch 75, where the loss spiked and training failed with NaN.
64
+
65
+ Romain Beaumont was training H/14 and g/14 models at the same time on Stability cluster and hit similar instabilities. Collectively we tried restarts with,
66
+ * different dataset shuffle seed
67
+ * different LR
68
+ * gradient clipping
69
+ * modifications to the architecture
70
+ * Norm modifications (stable norm for final, post embed norm for text transformer) as per https://github.com/mlfoundations/open_clip/pull/153 thanks to Phil Wang
71
+ * Extra attention block norms ala Normformer (https://arxiv.org/abs/2110.09456)
72
+ * Scaled cosine attention ala Swin-V2 (https://arxiv.org/abs/2111.09883)
73
+
74
+ None of the above ended up working. Most blew up within the same epoch as original, with the exception of architecture mods.
75
+ * Normformer mods signifcantly altered the network such that resuming did not quickly converge to previous performance, this was abandoned but might be worth trying from start.
76
+ * Scaled cosine attn initially looked promising and lasted until epoch 90 before loss suddenly increased and appeared to remain 'stuck'.
77
+
78
+ In the end, restarting at epoch 69 with `float32` precision solved all instabilities and training continued from there with global batch size 86k (224 per GPU). On A100 GPUs, `float32` had a minimal impact on the throughput once `tf32` matmuls were enabled in PyTorch. Approximately 10% slower than `float16 AMP`. Romain similary changed the precision but ended up using `bfloat16 AMP` to resolve issues.
79
+
80
+ ### Slum Script
81
+
82
+ ```
83
+ #SBATCH --nodes=96
84
+ #SBATCH --gres=gpu:4
85
+ #SBATCH --ntasks-per-node=4
86
+ #SBATCH --cpus-per-task=6
87
+ #SBATCH --wait-all-nodes=1
88
+ #SBATCH --job-name=open_clip_laion2b
89
+
90
+ # load low-level libraries
91
+ ml purge
92
+ source /conda/bin/activate pytorch-112
93
+
94
+ export NCCL_ASYNC_ERROR_HANDLING=1
95
+ export CUDA_VISIBLE_DEVICES=0,1,2,3
96
+ export MASTER_PORT=12802
97
+
98
+ ### get the first node name as master address - customized for vgg slurm
99
+ ### e.g. master(gnodee[2-5],gnoded1) == gnodee2
100
+ echo "NODELIST="${SLURM_NODELIST}
101
+ master_addr=$(scontrol show hostnames "$SLURM_JOB_NODELIST" | head -n 1)
102
+ export MASTER_ADDR=$master_addr"i"
103
+ echo "MASTER_ADDR="$MASTER_ADDR
104
+
105
+ cd /home/me/open_clip
106
+ export PYTHONPATH="$PYTHONPATH:$PWD/src"
107
+
108
+ srun --cpu_bind=none,v --accel-bind=gn python -u src/training/main.py \
109
+ --save-frequency 1 \
110
+ --zeroshot-frequency 1 \
111
+ --train-data="/data/laion2B-en/{00000..23295}.tar" \
112
+ --train-num-samples=200000000 \
113
+ --warmup 10000 \
114
+ --lr "1e-3" \
115
+ --batch-size=224 \
116
+ --epochs=160 \
117
+ --workers=6 \
118
+ --model ViT-L-14 \
119
+ --name "L14-laion2B" \
120
+ --report-to "tensorboard" \
121
+ --seed 0 \
122
+ --precision 'fp32' \
123
+ --ddp-static-graph \
124
+ --local-loss \
125
+ --dataset-resampled \
126
+ --gather-with-grad \
127
+ --grad-checkpointing
128
+ ```
129
 
130
  # Evaluation
131
 
 
141
 
142
  ## Results
143
 
144
+ The model achieves a 75.3 zero-shot top-1 accuracy on ImageNet-1k.
145
+
146
+ An initial round of benchmarks have been performed on a wider range of datasets, currently viewable at https://github.com/LAION-AI/CLIP_benchmark/blob/main/benchmark/results.ipynb
147
+
148
+ **TODO** - create table for just this model's metrics.
149
+
150
+ # Acknowledgements
151
+
152
+ Acknowledging the Gauss Centre for Supercomputing e.V. (http://gauss-centre.eu) for funding this part of work by providing computing time through the John von Neumann Institute for Computing (NIC) on the GCS Supercomputer JUWELS Booster at Jülich Supercomputing Centre (JSC).
153
 
154
  # Citation
155