Image Feature Extraction
Birder
PyTorch
hassonofer commited on
Commit
1d7db34
·
verified ·
1 Parent(s): 83a778f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -3
README.md CHANGED
@@ -1,3 +1,70 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - image-feature-extraction
4
+ - birder
5
+ library_name: birder
6
+ license: apache-2.0
7
+ ---
8
+
9
+ # Model Card for vit_l16_mim
10
+
11
+ A ViT-L16 image encoder pre-trained using Masked Image Modeling (MIM). This model has *not* been fine-tuned for a specific classification task and is intended to be used as a general-purpose feature extractor or a backbone for downstream tasks like object detection, segmentation, or custom classification.
12
+
13
+ ## Model Details
14
+
15
+ - **Model Type:** Image encoder
16
+ - **Model Stats:**
17
+ - Params (M): 303.3
18
+ - Input image size: 224 x 224
19
+ - **Dataset:** Trained on a diverse dataset of approximately 11M images, including a substantial collection of bird imagery (50% of the dataset)
20
+
21
+ - **Papers:**
22
+ - An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: <https://arxiv.org/abs/2010.11929>
23
+ - Masked Autoencoders Are Scalable Vision Learners: <https://arxiv.org/abs/2111.06377>
24
+
25
+ ## Model Usage
26
+
27
+ ### Image Embeddings
28
+
29
+ ```python
30
+ import birder
31
+ from PIL import Image
32
+
33
+ (net, _, signature, rgb_stats) = birder.load_pretrained_model("vit_l16_mim_200", inference=True)
34
+
35
+ # Get the image size the model was trained on
36
+ size = birder.get_size_from_signature(signature)
37
+
38
+ # Create an inference transform
39
+ transform = birder.classification_transform(size, rgb_stats)
40
+
41
+ image = Image.open("path/to/image.jpeg")
42
+ input_tensor = transform(image).unsqueeze(dim=0)
43
+ with torch.inference_mode():
44
+ embedding = net.embedding(input_tensor)
45
+ # embedding is a tensor with shape of (1, embedding_size)
46
+ ```
47
+
48
+ ## Citation
49
+
50
+ ```bibtex
51
+ @misc{dosovitskiy2021imageworth16x16words,
52
+ title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
53
+ author={Alexey Dosovitskiy and Lucas Beyer and Alexander Kolesnikov and Dirk Weissenborn and Xiaohua Zhai and Thomas Unterthiner and Mostafa Dehghani and Matthias Minderer and Georg Heigold and Sylvain Gelly and Jakob Uszkoreit and Neil Houlsby},
54
+ year={2021},
55
+ eprint={2010.11929},
56
+ archivePrefix={arXiv},
57
+ primaryClass={cs.CV},
58
+ url={https://arxiv.org/abs/2010.11929},
59
+ }
60
+
61
+ @misc{he2021maskedautoencodersscalablevision,
62
+ title={Masked Autoencoders Are Scalable Vision Learners},
63
+ author={Kaiming He and Xinlei Chen and Saining Xie and Yanghao Li and Piotr Dollár and Ross Girshick},
64
+ year={2021},
65
+ eprint={2111.06377},
66
+ archivePrefix={arXiv},
67
+ primaryClass={cs.CV},
68
+ url={https://arxiv.org/abs/2111.06377},
69
+ }
70
+ ```