amaye15 commited on
Commit
f3003dd
·
verified ·
1 Parent(s): a47cb18

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +77 -3
README.md CHANGED
@@ -1,3 +1,77 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ base_model:
4
+ - apple/aimv2-large-patch14-native
5
+ pipeline_tag: image-classification
6
+ tags:
7
+ - image-classification
8
+ - vision
9
+ ---
10
+
11
+
12
+ # AIMv2-Large-Patch14-Native Image Classification
13
+
14
+ [Original AIMv2 Paper](https://arxiv.org/abs/2411.14402) | [BibTeX](#citation)
15
+
16
+ This repository contains an adapted version of the original AIMv2 model, modified to be compatible with the `AutoModelForImageClassification` class from Hugging Face Transformers. This adaptation enables seamless use of the model for image classification tasks.
17
+
18
+ ## Introduction
19
+
20
+ We have adapted the original `apple/aimv2-large-patch14-native` model to work with `AutoModelForImageClassification`. The AIMv2 family consists of vision models pre-trained with a multimodal autoregressive objective, offering robust performance across various benchmarks.
21
+
22
+ Some highlights of the AIMv2 models include:
23
+
24
+ 1. Outperforming OAI CLIP and SigLIP on the majority of multimodal understanding benchmarks.
25
+ 2. Surpassing DINOv2 in open-vocabulary object detection and referring expression comprehension.
26
+ 3. Demonstrating strong recognition performance, with AIMv2-3B achieving **89.5% on ImageNet using a frozen trunk**.
27
+
28
+ ## Usage
29
+
30
+ ### PyTorch
31
+
32
+ ```python
33
+ import requests
34
+ from PIL import Image
35
+ from transformers import AutoImageProcessor, AutoModelForImageClassification
36
+
37
+ url = "http://images.cocodataset.org/val2017/000000039769.jpg"
38
+ image = Image.open(requests.get(url, stream=True).raw)
39
+
40
+ processor = AutoImageProcessor.from_pretrained(
41
+ "amaye15/aimv2-large-patch14-native-image-classification",
42
+ )
43
+ model = AutoModelForImageClassification.from_pretrained(
44
+ "amaye15/aimv2-large-patch14-native-image-classification",
45
+ trust_remote_code=True,
46
+ )
47
+
48
+ inputs = processor(images=image, return_tensors="pt")
49
+ outputs = model(**inputs)
50
+
51
+ # Get predicted class
52
+ predictions = outputs.logits.softmax(dim=-1)
53
+ predicted_class = predictions.argmax(-1).item()
54
+
55
+ print(f"Predicted class: {model.config.id2label[predicted_class]}")
56
+ ```
57
+
58
+ ## Model Details
59
+
60
+ - **Model Name**: `amaye15/aimv2-large-patch14-native-image-classification`
61
+ - **Original Model**: `apple/aimv2-large-patch14-native`
62
+ - **Adaptation**: Modified to be compatible with `AutoModelForImageClassification` for direct use in image classification tasks.
63
+ - **Framework**: PyTorch
64
+ - **License**: [Specify license if applicable]
65
+
66
+ ## Citation
67
+
68
+ If you use this model or find it helpful, please consider citing the original AIMv2 paper:
69
+
70
+ ```bibtex
71
+ @article{yang2023aimv2,
72
+ title={AIMv2: Advances in Multimodal Vision Models},
73
+ author={Yang, Li and others},
74
+ journal={arXiv preprint arXiv:2411.14402},
75
+ year={2023}
76
+ }
77
+ ```