Image Classification
PyTorch
ml-aim
alaaelnouby commited on
Commit
5ac297a
1 Parent(s): 12524c5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +80 -0
README.md CHANGED
@@ -2,4 +2,84 @@
2
  license: other
3
  license_name: apple-sample-code-license
4
  license_link: LICENSE
 
 
5
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: other
3
  license_name: apple-sample-code-license
4
  license_link: LICENSE
5
+ library_name: ml-aim
6
+ pipeline_tag: image-classification
7
  ---
8
+
9
+ # AIM: Autoregressive Image Models
10
+
11
+ *Alaaeldin El-Nouby, Michal Klein, Shuangfei Zhai, Miguel Angel Bautista, Alexander Toshev, Vaishaal Shankar,
12
+ Joshua M Susskind, and Armand Joulin*
13
+
14
+
15
+ This software project accompanies the research paper, [Scalable Pre-training of Large Autoregressive Image Models](https://arxiv.org/abs/2401.08541).
16
+
17
+ We introduce **AIM** a collection of vision models pre-trained with an autoregressive generative objective.
18
+ We show that autoregressive pre-training of image features exhibits similar scaling properties to their
19
+ textual counterpart (i.e. Large Language Models). Specifically, we highlight two findings:
20
+ 1. the model capacity can be trivially scaled to billions of parameters, and
21
+ 2. AIM effectively leverages large collections of uncurated image data.
22
+
23
+ ## Installation
24
+ Please install PyTorch using the official [installation instructions](https://pytorch.org/get-started/locally/).
25
+ Afterward, install the package as:
26
+ ```commandline
27
+ pip install git+https://[email protected]/apple/ml-aim.git
28
+ ```
29
+
30
+
31
+ ## Usage
32
+ Below we provide an example of loading the model via [HuggingFace Hub](https://huggingface.co/docs/hub/) as:
33
+ ```python
34
+ from PIL import Image
35
+
36
+ from aim.torch.models import AIMForImageClassification
37
+ from aim.torch.data import val_transforms
38
+
39
+ img = Image.open(...)
40
+ model = AIMForImageClassification.from_pretrained("apple/aim-600M")
41
+ transform = val_transforms()
42
+
43
+ inp = transform(img).unsqueeze(0)
44
+ logits, features = model(inp)
45
+ ```
46
+
47
+ ### ImageNet-1k results (frozen trunk)
48
+
49
+ The table below contains the classification results on ImageNet-1k validation set.
50
+
51
+ <table style="margin: auto">
52
+ <thead>
53
+ <tr>
54
+ <th rowspan="2">model</th>
55
+ <th colspan="2">top-1 IN-1k</th>
56
+ </tr>
57
+ <tr>
58
+ <th>last layer</th>
59
+ <th>best layer</th>
60
+ </tr>
61
+ </thead>
62
+
63
+ <tbody>
64
+ <tr>
65
+ <td>AIM-0.6B</td>
66
+ <td>78.5%</td>
67
+ <td>79.4%</td>
68
+ </tr>
69
+ <tr>
70
+ <td>AIM-1B</td>
71
+ <td>80.6%</td>
72
+ <td>82.3%</td>
73
+ </tr>
74
+ <tr>
75
+ <td>AIM-3B</td>
76
+ <td>82.2%</td>
77
+ <td>83.3%</td>
78
+ </tr>
79
+ <tr>
80
+ <td>AIM-7B</td>
81
+ <td>82.4%</td>
82
+ <td>84.0%</td>
83
+ </tr>
84
+ </tbody>
85
+ </table>