birder-project
/

vitreg4_b16_mim

Image Feature Extraction

Model card Files Files and versions Community

hassonofer commited on Jan 28

Commit

1404ff7

·

verified ·

1 Parent(s): da3cd7a

Update README.md

Files changed (1) hide show

README.md +10 -1

README.md CHANGED Viewed

@@ -16,7 +16,16 @@ A ViTReg4-B16 image encoder pre-trained using Masked Image Modeling (MIM). This
 - **Model Stats:**
     - Params (M): 85.8
     - Input image size: 224 x 224
-- **Dataset:** Trained on a diverse dataset of approximately 11M images, including a substantial collection of bird imagery (50% of the dataset)
 - **Papers:**
     - An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: <https://arxiv.org/abs/2010.11929>

 - **Model Stats:**
     - Params (M): 85.8
     - Input image size: 224 x 224
+- **Dataset:** Trained on a diverse dataset of approximately 11M images, including:
+    - iNaturalist 2021 (~3.3M)
+    - WebVision-2.0 (~1.5M random subset)
+    - imagenet-w21-webp-wds (~1M random subset)
+    - SA-1B (~220K random subset of 20 chunks)
+    - COCO (~120K)
+    - NABirds (~48K)
+    - Birdsnap v1.1 (~44K)
+    - CUB-200 2011 (~18K)
+    - The Birder dataset (~5M, private dataset)
 - **Papers:**
     - An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: <https://arxiv.org/abs/2010.11929>