jameslahm commited on
Commit
7c85355
1 Parent(s): c97afbd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -192
README.md CHANGED
@@ -29,7 +29,8 @@ Segment Anything Model (SAM) has shown impressive zero-shot transfer performance
29
 
30
  ## Installation
31
  ```bash
32
- pip install -e .
 
33
  # download pretrained checkpoint
34
  mkdir weights && cd weights
35
  wget https://github.com/THU-MIG/RepViT/releases/download/v1.0/repvit_sam.pt
@@ -42,11 +43,11 @@ python app/app.py
42
  ```
43
 
44
  ## CoreML export
45
- Please refer to [coreml_example.ipynb](./notebooks/coreml_example.ipynb)
46
 
47
 
48
  ## Latency comparisons
49
- Comparison between RepViT-SAM and others in terms of latency. The latency (ms) is measured with the standard resolution of 1024 $\times$ 1024 on iPhone 12 and Macbook M1 Pro by Core ML Tools. OOM means out of memory.
50
 
51
  <table class="tg">
52
  <thead>
@@ -74,195 +75,6 @@ Comparison between RepViT-SAM and others in terms of latency. The latency (ms) i
74
  </tbody>
75
  </table>
76
 
77
-
78
- ## Zero-shot edge detection
79
-
80
- Comparison results on BSDS500.
81
-
82
- <table class="tg">
83
- <thead>
84
- <tr>
85
- <th class="tg-c3ow" rowspan="2">Model</th>
86
- <th class="tg-c3ow" colspan="3">zero-shot edge detection</th>
87
- </tr>
88
- <tr>
89
- <th class="tg-c3ow">ODS</th>
90
- <th class="tg-c3ow">OIS</th>
91
- <th class="tg-c3ow">AP</th>
92
- </tr>
93
- </thead>
94
- <tbody>
95
- <tr>
96
- <td class="tg-c3ow">ViT-H-SAM</td>
97
- <td class="tg-c3ow"><b>.768</b></td>
98
- <td class="tg-c3ow"><b>.786</b></td>
99
- <td class="tg-c3ow"><b>.794</b></td>
100
- </tr>
101
- <tr>
102
- <td class="tg-c3ow">ViT-B-SAM</td>
103
- <td class="tg-c3ow">.743</td>
104
- <td class="tg-c3ow">.764</td>
105
- <td class="tg-c3ow">.726</td>
106
- </tr>
107
- <tr>
108
- <td class="tg-c3ow">MobileSAM</td>
109
- <td class="tg-c3ow">.756</td>
110
- <td class="tg-c3ow">.768</td>
111
- <td class="tg-c3ow">.746</td>
112
- </tr>
113
- <tr>
114
- <td class="tg-c3ow">RepViT-SAM</td>
115
- <td class="tg-c3ow"><ins>.764</ins></td>
116
- <td class="tg-c3ow"><ins>.786</ins></td>
117
- <td class="tg-c3ow"><ins>.773</ins></td>
118
- </tr>
119
- </tbody>
120
- </table>
121
-
122
-
123
- ## Zero-shot instance segmentation and SegInW
124
- Comparison results on COCO and SegInW.
125
-
126
- <table class="tg">
127
- <thead>
128
- <tr>
129
- <th class="tg-c3ow" rowspan="2">Model</th>
130
- <th class="tg-c3ow" colspan="4">zero-shot instance segmentation</th>
131
- <th class="tg-c3ow">SegInW</th>
132
- </tr>
133
- <tr>
134
- <th class="tg-c3ow">AP</th>
135
- <th class="tg-c3ow">$AP^{S}$</th>
136
- <th class="tg-c3ow">$AP^{M}$</th>
137
- <th class="tg-c3ow">$AP^{L}$</th>
138
- <th class="tg-c3ow">Mean AP</th>
139
- </tr>
140
- </thead>
141
- <tbody>
142
- <tr>
143
- <td class="tg-c3ow">ViT-H-SAM</td>
144
- <td class="tg-c3ow"><b>46.8</b></td>
145
- <td class="tg-c3ow"><b>31.8</b></td>
146
- <td class="tg-c3ow"><b>51.0</b></td>
147
- <td class="tg-c3ow"><b>63.6</b></td>
148
- <td class="tg-c3ow"><b>48.7</b></td>
149
- </tr>
150
- <tr>
151
- <td class="tg-c3ow">ViT-B-SAM</td>
152
- <td class="tg-c3ow">42.5</td>
153
- <td class="tg-c3ow"><ins>29.8</ins></td>
154
- <td class="tg-c3ow">47.0</td>
155
- <td class="tg-c3ow">56.8</td>
156
- <td class="tg-c3ow">44.8</td>
157
- </tr>
158
- <tr>
159
- <td class="tg-c3ow">MobileSAM</td>
160
- <td class="tg-c3ow">42.7</td>
161
- <td class="tg-c3ow">27.0</td>
162
- <td class="tg-c3ow">46.5</td>
163
- <td class="tg-c3ow">61.1</td>
164
- <td class="tg-c3ow">43.9</td>
165
- </tr>
166
- <tr>
167
- <td class="tg-c3ow">RepViT-SAM</td>
168
- <td class="tg-c3ow"><ins>44.4</ins></td>
169
- <td class="tg-c3ow">29.1</td>
170
- <td class="tg-c3ow"><ins>48.6</ins></td>
171
- <td class="tg-c3ow"><ins>61.4</ins></td>
172
- <td class="tg-c3ow"><ins>46.1</ins></td>
173
- </tr>
174
- </tbody>
175
- </table>
176
-
177
- ## Zero-shot video object/instance segmentation
178
- Comparison results on DAVIS 2017 and UVO.
179
-
180
- <table class="tg">
181
- <thead>
182
- <tr>
183
- <th class="tg-c3ow" rowspan="2">Model</th>
184
- <th class="tg-c3ow" colspan="3">z.s. VOS</th>
185
- <th class="tg-c3ow">z.s. VIS</th>
186
- </tr>
187
- <tr>
188
- <th class="tg-c3ow">$\mathcal{J\&amp;F}$</th>
189
- <th class="tg-c3ow">$\mathcal{J}$</th>
190
- <th class="tg-c3ow">$\mathcal{F}$</th>
191
- <th class="tg-c3ow">AR100</th>
192
- </tr>
193
- </thead>
194
- <tbody>
195
- <tr>
196
- <td class="tg-c3ow">ViT-H-SAM</td>
197
- <td class="tg-c3ow"><b>77.4</b></td>
198
- <td class="tg-c3ow"><b>74.6</b></td>
199
- <td class="tg-c3ow"><b>80.2</b></td>
200
- <td class="tg-c3ow"><b>28.8</b></td>
201
- </tr>
202
- <tr>
203
- <td class="tg-c3ow">ViT-B-SAM</td>
204
- <td class="tg-c3ow">71.3</td>
205
- <td class="tg-c3ow">68.5</td>
206
- <td class="tg-c3ow">74.1</td>
207
- <td class="tg-c3ow">19.1</td>
208
- </tr>
209
- <tr>
210
- <td class="tg-c3ow">MobileSAM</td>
211
- <td class="tg-c3ow">71.1</td>
212
- <td class="tg-c3ow">68.6</td>
213
- <td class="tg-c3ow">73.6</td>
214
- <td class="tg-c3ow">22.7</td>
215
- </tr>
216
- <tr>
217
- <td class="tg-c3ow">RepViT-SAM</td>
218
- <td class="tg-c3ow"><ins>73.5</ins></td>
219
- <td class="tg-c3ow"><ins>71.0</ins></td>
220
- <td class="tg-c3ow"><ins>76.1</ins></td>
221
- <td class="tg-c3ow"><ins>25.3</ins></td>
222
- </tr>
223
- </tbody>
224
- </table>
225
-
226
- ## Zero-shot salient object segmentation
227
- Comparison results on DUTS.
228
- ## Zero-shot anomaly detection
229
- Comparison results on MVTec.
230
- <table class="tg">
231
- <thead>
232
- <tr>
233
- <th class="tg-c3ow" rowspan="2">Model</th>
234
- <th class="tg-c3ow">z.s. s.o.s.</th>
235
- <th class="tg-c3ow">z.s. a.d.</th>
236
- </tr>
237
- <tr>
238
- <th class="tg-c3ow">$\mathcal{M}$ $\downarrow$</th>
239
- <th class="tg-c3ow">$\mathcal{F}_{p}$</th>
240
- </tr>
241
- </thead>
242
- <tbody>
243
- <tr>
244
- <td class="tg-c3ow">ViT-H-SAM</td>
245
- <td class="tg-c3ow"><b>0.046</b></td>
246
- <td class="tg-c3ow"><ins>37.65</ins></td>
247
- </tr>
248
- <tr>
249
- <td class="tg-c3ow">ViT-B-SAM</td>
250
- <td class="tg-c3ow">0.121</td>
251
- <td class="tg-c3ow">36.62</td>
252
- </tr>
253
- <tr>
254
- <td class="tg-c3ow">MobileSAM</td>
255
- <td class="tg-c3ow">0.147</td>
256
- <td class="tg-c3ow">36.44</td>
257
- </tr>
258
- <tr>
259
- <td class="tg-c3ow">RepViT-SAM</td>
260
- <td class="tg-c3ow"><ins>0.066</ins></td>
261
- <td class="tg-c3ow"><b>37.96</b></td>
262
- </tr>
263
- </tbody>
264
- </table>
265
-
266
  ## Acknowledgement
267
 
268
  The code base is partly built with [SAM](https://github.com/facebookresearch/segment-anything) and [MobileSAM](https://github.com/ChaoningZhang/MobileSAM).
 
29
 
30
  ## Installation
31
  ```bash
32
+ git clone https://github.com/THU-MIG/RepViT
33
+ cd sam && pip install -e .
34
  # download pretrained checkpoint
35
  mkdir weights && cd weights
36
  wget https://github.com/THU-MIG/RepViT/releases/download/v1.0/repvit_sam.pt
 
43
  ```
44
 
45
  ## CoreML export
46
+ Please refer to [coreml_example.ipynb](https://github.com/THU-MIG/RepViT/blob/main/sam/notebooks/coreml_example.ipynb)
47
 
48
 
49
  ## Latency comparisons
50
+ Comparison between RepViT-SAM and others in terms of latency. The latency (ms) is measured with the standard resolution of 1024 x 1024 on iPhone 12 and Macbook M1 Pro by Core ML Tools. OOM means out of memory.
51
 
52
  <table class="tg">
53
  <thead>
 
75
  </tbody>
76
  </table>
77
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
78
  ## Acknowledgement
79
 
80
  The code base is partly built with [SAM](https://github.com/facebookresearch/segment-anything) and [MobileSAM](https://github.com/ChaoningZhang/MobileSAM).