Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
@@ -29,7 +29,8 @@ Segment Anything Model (SAM) has shown impressive zero-shot transfer performance
|
|
29 |
|
30 |
## Installation
|
31 |
```bash
|
32 |
-
|
|
|
33 |
# download pretrained checkpoint
|
34 |
mkdir weights && cd weights
|
35 |
wget https://github.com/THU-MIG/RepViT/releases/download/v1.0/repvit_sam.pt
|
@@ -42,11 +43,11 @@ python app/app.py
|
|
42 |
```
|
43 |
|
44 |
## CoreML export
|
45 |
-
Please refer to [coreml_example.ipynb](
|
46 |
|
47 |
|
48 |
## Latency comparisons
|
49 |
-
Comparison between RepViT-SAM and others in terms of latency. The latency (ms) is measured with the standard resolution of 1024
|
50 |
|
51 |
<table class="tg">
|
52 |
<thead>
|
@@ -74,195 +75,6 @@ Comparison between RepViT-SAM and others in terms of latency. The latency (ms) i
|
|
74 |
</tbody>
|
75 |
</table>
|
76 |
|
77 |
-
|
78 |
-
## Zero-shot edge detection
|
79 |
-
|
80 |
-
Comparison results on BSDS500.
|
81 |
-
|
82 |
-
<table class="tg">
|
83 |
-
<thead>
|
84 |
-
<tr>
|
85 |
-
<th class="tg-c3ow" rowspan="2">Model</th>
|
86 |
-
<th class="tg-c3ow" colspan="3">zero-shot edge detection</th>
|
87 |
-
</tr>
|
88 |
-
<tr>
|
89 |
-
<th class="tg-c3ow">ODS</th>
|
90 |
-
<th class="tg-c3ow">OIS</th>
|
91 |
-
<th class="tg-c3ow">AP</th>
|
92 |
-
</tr>
|
93 |
-
</thead>
|
94 |
-
<tbody>
|
95 |
-
<tr>
|
96 |
-
<td class="tg-c3ow">ViT-H-SAM</td>
|
97 |
-
<td class="tg-c3ow"><b>.768</b></td>
|
98 |
-
<td class="tg-c3ow"><b>.786</b></td>
|
99 |
-
<td class="tg-c3ow"><b>.794</b></td>
|
100 |
-
</tr>
|
101 |
-
<tr>
|
102 |
-
<td class="tg-c3ow">ViT-B-SAM</td>
|
103 |
-
<td class="tg-c3ow">.743</td>
|
104 |
-
<td class="tg-c3ow">.764</td>
|
105 |
-
<td class="tg-c3ow">.726</td>
|
106 |
-
</tr>
|
107 |
-
<tr>
|
108 |
-
<td class="tg-c3ow">MobileSAM</td>
|
109 |
-
<td class="tg-c3ow">.756</td>
|
110 |
-
<td class="tg-c3ow">.768</td>
|
111 |
-
<td class="tg-c3ow">.746</td>
|
112 |
-
</tr>
|
113 |
-
<tr>
|
114 |
-
<td class="tg-c3ow">RepViT-SAM</td>
|
115 |
-
<td class="tg-c3ow"><ins>.764</ins></td>
|
116 |
-
<td class="tg-c3ow"><ins>.786</ins></td>
|
117 |
-
<td class="tg-c3ow"><ins>.773</ins></td>
|
118 |
-
</tr>
|
119 |
-
</tbody>
|
120 |
-
</table>
|
121 |
-
|
122 |
-
|
123 |
-
## Zero-shot instance segmentation and SegInW
|
124 |
-
Comparison results on COCO and SegInW.
|
125 |
-
|
126 |
-
<table class="tg">
|
127 |
-
<thead>
|
128 |
-
<tr>
|
129 |
-
<th class="tg-c3ow" rowspan="2">Model</th>
|
130 |
-
<th class="tg-c3ow" colspan="4">zero-shot instance segmentation</th>
|
131 |
-
<th class="tg-c3ow">SegInW</th>
|
132 |
-
</tr>
|
133 |
-
<tr>
|
134 |
-
<th class="tg-c3ow">AP</th>
|
135 |
-
<th class="tg-c3ow">$AP^{S}$</th>
|
136 |
-
<th class="tg-c3ow">$AP^{M}$</th>
|
137 |
-
<th class="tg-c3ow">$AP^{L}$</th>
|
138 |
-
<th class="tg-c3ow">Mean AP</th>
|
139 |
-
</tr>
|
140 |
-
</thead>
|
141 |
-
<tbody>
|
142 |
-
<tr>
|
143 |
-
<td class="tg-c3ow">ViT-H-SAM</td>
|
144 |
-
<td class="tg-c3ow"><b>46.8</b></td>
|
145 |
-
<td class="tg-c3ow"><b>31.8</b></td>
|
146 |
-
<td class="tg-c3ow"><b>51.0</b></td>
|
147 |
-
<td class="tg-c3ow"><b>63.6</b></td>
|
148 |
-
<td class="tg-c3ow"><b>48.7</b></td>
|
149 |
-
</tr>
|
150 |
-
<tr>
|
151 |
-
<td class="tg-c3ow">ViT-B-SAM</td>
|
152 |
-
<td class="tg-c3ow">42.5</td>
|
153 |
-
<td class="tg-c3ow"><ins>29.8</ins></td>
|
154 |
-
<td class="tg-c3ow">47.0</td>
|
155 |
-
<td class="tg-c3ow">56.8</td>
|
156 |
-
<td class="tg-c3ow">44.8</td>
|
157 |
-
</tr>
|
158 |
-
<tr>
|
159 |
-
<td class="tg-c3ow">MobileSAM</td>
|
160 |
-
<td class="tg-c3ow">42.7</td>
|
161 |
-
<td class="tg-c3ow">27.0</td>
|
162 |
-
<td class="tg-c3ow">46.5</td>
|
163 |
-
<td class="tg-c3ow">61.1</td>
|
164 |
-
<td class="tg-c3ow">43.9</td>
|
165 |
-
</tr>
|
166 |
-
<tr>
|
167 |
-
<td class="tg-c3ow">RepViT-SAM</td>
|
168 |
-
<td class="tg-c3ow"><ins>44.4</ins></td>
|
169 |
-
<td class="tg-c3ow">29.1</td>
|
170 |
-
<td class="tg-c3ow"><ins>48.6</ins></td>
|
171 |
-
<td class="tg-c3ow"><ins>61.4</ins></td>
|
172 |
-
<td class="tg-c3ow"><ins>46.1</ins></td>
|
173 |
-
</tr>
|
174 |
-
</tbody>
|
175 |
-
</table>
|
176 |
-
|
177 |
-
## Zero-shot video object/instance segmentation
|
178 |
-
Comparison results on DAVIS 2017 and UVO.
|
179 |
-
|
180 |
-
<table class="tg">
|
181 |
-
<thead>
|
182 |
-
<tr>
|
183 |
-
<th class="tg-c3ow" rowspan="2">Model</th>
|
184 |
-
<th class="tg-c3ow" colspan="3">z.s. VOS</th>
|
185 |
-
<th class="tg-c3ow">z.s. VIS</th>
|
186 |
-
</tr>
|
187 |
-
<tr>
|
188 |
-
<th class="tg-c3ow">$\mathcal{J\&F}$</th>
|
189 |
-
<th class="tg-c3ow">$\mathcal{J}$</th>
|
190 |
-
<th class="tg-c3ow">$\mathcal{F}$</th>
|
191 |
-
<th class="tg-c3ow">AR100</th>
|
192 |
-
</tr>
|
193 |
-
</thead>
|
194 |
-
<tbody>
|
195 |
-
<tr>
|
196 |
-
<td class="tg-c3ow">ViT-H-SAM</td>
|
197 |
-
<td class="tg-c3ow"><b>77.4</b></td>
|
198 |
-
<td class="tg-c3ow"><b>74.6</b></td>
|
199 |
-
<td class="tg-c3ow"><b>80.2</b></td>
|
200 |
-
<td class="tg-c3ow"><b>28.8</b></td>
|
201 |
-
</tr>
|
202 |
-
<tr>
|
203 |
-
<td class="tg-c3ow">ViT-B-SAM</td>
|
204 |
-
<td class="tg-c3ow">71.3</td>
|
205 |
-
<td class="tg-c3ow">68.5</td>
|
206 |
-
<td class="tg-c3ow">74.1</td>
|
207 |
-
<td class="tg-c3ow">19.1</td>
|
208 |
-
</tr>
|
209 |
-
<tr>
|
210 |
-
<td class="tg-c3ow">MobileSAM</td>
|
211 |
-
<td class="tg-c3ow">71.1</td>
|
212 |
-
<td class="tg-c3ow">68.6</td>
|
213 |
-
<td class="tg-c3ow">73.6</td>
|
214 |
-
<td class="tg-c3ow">22.7</td>
|
215 |
-
</tr>
|
216 |
-
<tr>
|
217 |
-
<td class="tg-c3ow">RepViT-SAM</td>
|
218 |
-
<td class="tg-c3ow"><ins>73.5</ins></td>
|
219 |
-
<td class="tg-c3ow"><ins>71.0</ins></td>
|
220 |
-
<td class="tg-c3ow"><ins>76.1</ins></td>
|
221 |
-
<td class="tg-c3ow"><ins>25.3</ins></td>
|
222 |
-
</tr>
|
223 |
-
</tbody>
|
224 |
-
</table>
|
225 |
-
|
226 |
-
## Zero-shot salient object segmentation
|
227 |
-
Comparison results on DUTS.
|
228 |
-
## Zero-shot anomaly detection
|
229 |
-
Comparison results on MVTec.
|
230 |
-
<table class="tg">
|
231 |
-
<thead>
|
232 |
-
<tr>
|
233 |
-
<th class="tg-c3ow" rowspan="2">Model</th>
|
234 |
-
<th class="tg-c3ow">z.s. s.o.s.</th>
|
235 |
-
<th class="tg-c3ow">z.s. a.d.</th>
|
236 |
-
</tr>
|
237 |
-
<tr>
|
238 |
-
<th class="tg-c3ow">$\mathcal{M}$ $\downarrow$</th>
|
239 |
-
<th class="tg-c3ow">$\mathcal{F}_{p}$</th>
|
240 |
-
</tr>
|
241 |
-
</thead>
|
242 |
-
<tbody>
|
243 |
-
<tr>
|
244 |
-
<td class="tg-c3ow">ViT-H-SAM</td>
|
245 |
-
<td class="tg-c3ow"><b>0.046</b></td>
|
246 |
-
<td class="tg-c3ow"><ins>37.65</ins></td>
|
247 |
-
</tr>
|
248 |
-
<tr>
|
249 |
-
<td class="tg-c3ow">ViT-B-SAM</td>
|
250 |
-
<td class="tg-c3ow">0.121</td>
|
251 |
-
<td class="tg-c3ow">36.62</td>
|
252 |
-
</tr>
|
253 |
-
<tr>
|
254 |
-
<td class="tg-c3ow">MobileSAM</td>
|
255 |
-
<td class="tg-c3ow">0.147</td>
|
256 |
-
<td class="tg-c3ow">36.44</td>
|
257 |
-
</tr>
|
258 |
-
<tr>
|
259 |
-
<td class="tg-c3ow">RepViT-SAM</td>
|
260 |
-
<td class="tg-c3ow"><ins>0.066</ins></td>
|
261 |
-
<td class="tg-c3ow"><b>37.96</b></td>
|
262 |
-
</tr>
|
263 |
-
</tbody>
|
264 |
-
</table>
|
265 |
-
|
266 |
## Acknowledgement
|
267 |
|
268 |
The code base is partly built with [SAM](https://github.com/facebookresearch/segment-anything) and [MobileSAM](https://github.com/ChaoningZhang/MobileSAM).
|
|
|
29 |
|
30 |
## Installation
|
31 |
```bash
|
32 |
+
git clone https://github.com/THU-MIG/RepViT
|
33 |
+
cd sam && pip install -e .
|
34 |
# download pretrained checkpoint
|
35 |
mkdir weights && cd weights
|
36 |
wget https://github.com/THU-MIG/RepViT/releases/download/v1.0/repvit_sam.pt
|
|
|
43 |
```
|
44 |
|
45 |
## CoreML export
|
46 |
+
Please refer to [coreml_example.ipynb](https://github.com/THU-MIG/RepViT/blob/main/sam/notebooks/coreml_example.ipynb)
|
47 |
|
48 |
|
49 |
## Latency comparisons
|
50 |
+
Comparison between RepViT-SAM and others in terms of latency. The latency (ms) is measured with the standard resolution of 1024 x 1024 on iPhone 12 and Macbook M1 Pro by Core ML Tools. OOM means out of memory.
|
51 |
|
52 |
<table class="tg">
|
53 |
<thead>
|
|
|
75 |
</tbody>
|
76 |
</table>
|
77 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
78 |
## Acknowledgement
|
79 |
|
80 |
The code base is partly built with [SAM](https://github.com/facebookresearch/segment-anything) and [MobileSAM](https://github.com/ChaoningZhang/MobileSAM).
|