Update README.md
Browse files
README.md
CHANGED
@@ -16,7 +16,16 @@ A ViTReg4-B16 image encoder pre-trained using Masked Image Modeling (MIM). This
|
|
16 |
- **Model Stats:**
|
17 |
- Params (M): 85.8
|
18 |
- Input image size: 224 x 224
|
19 |
-
- **Dataset:** Trained on a diverse dataset of approximately 11M images, including
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
|
21 |
- **Papers:**
|
22 |
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: <https://arxiv.org/abs/2010.11929>
|
|
|
16 |
- **Model Stats:**
|
17 |
- Params (M): 85.8
|
18 |
- Input image size: 224 x 224
|
19 |
+
- **Dataset:** Trained on a diverse dataset of approximately 11M images, including:
|
20 |
+
- iNaturalist 2021 (~3.3M)
|
21 |
+
- WebVision-2.0 (~1.5M random subset)
|
22 |
+
- imagenet-w21-webp-wds (~1M random subset)
|
23 |
+
- SA-1B (~220K random subset of 20 chunks)
|
24 |
+
- COCO (~120K)
|
25 |
+
- NABirds (~48K)
|
26 |
+
- Birdsnap v1.1 (~44K)
|
27 |
+
- CUB-200 2011 (~18K)
|
28 |
+
- The Birder dataset (~5M, private dataset)
|
29 |
|
30 |
- **Papers:**
|
31 |
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: <https://arxiv.org/abs/2010.11929>
|