File size: 3,076 Bytes
93eb021
 
 
545c86c
 
 
 
 
 
 
4e2e524
545c86c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
---
license: mit
---

# Faster Segement Anything (MobileSAM)

<!-- Provide a quick summary of what the model is/does. -->

- **Repository:** [Github - MobileSAM](https://github.com/ChaoningZhang/MobileSAM)
- **Paper:** [Faster Segment Anything: Towards Lightweight SAM for Mobile Applications](https://arxiv.org/pdf/2306.14289.pdf)
- **Demo:** [HuggingFace Demo](https://huggingface.co/spaces/dhkim2810/MobileSAM)

**MobileSAM** performs on par with the original SAM (at least visually) and keeps exactly the same pipeline as the original SAM except for a change on the image encoder. Specifically, we replace the original heavyweight ViT-H encoder (632M) with a much smaller Tiny-ViT (5M). On a single GPU, MobileSAM runs around 12ms per image: 8ms on the image encoder and 4ms on the mask decoder. 

The comparison of ViT-based image encoder is summarzed as follows: 

Image Encoder | Original SAM | MobileSAM
:------------:|:-------------:|:---------:
Paramters      |  611M   | 5M
Speed      |  452ms  | 8ms

Original SAM and MobileSAM have exactly the same prompt-guided mask decoder: 

Mask Decoder                                      | Original SAM | MobileSAM 
:-----------------------------------------:|:---------:|:-----:
Paramters      |  3.876M   | 3.876M
Speed      |  4ms  | 4ms

The comparison of the whole pipeline is summarzed as follows: 
Whole Pipeline (Enc+Dec)                                      | Original SAM | MobileSAM 
:-----------------------------------------:|:---------:|:-----:
Paramters      |  615M   | 9.66M
Speed      |  456ms  | 12ms


## Acknowledgement

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

<details>
<summary>
<a href="https://github.com/facebookresearch/segment-anything">SAM</a> (Segment Anything) [<b>bib</b>]
</summary>

```bibtex
@article{kirillov2023segany,
  title={Segment Anything}, 
  author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan-Yen and Doll{\'a}r, Piotr and Girshick, Ross},
  journal={arXiv:2304.02643},
  year={2023}
}
```
</details>



<details>
<summary>
<a href="https://github.com/microsoft/Cream/tree/main/TinyViT">TinyViT</a> (TinyViT: Fast Pretraining Distillation for Small Vision Transformers) [<b>bib</b>]
</summary>

```bibtex
@InProceedings{tiny_vit,
  title={TinyViT: Fast Pretraining Distillation for Small Vision Transformers},
  author={Wu, Kan and Zhang, Jinnian and Peng, Houwen and Liu, Mengchen and Xiao, Bin and Fu, Jianlong and Yuan, Lu},
  booktitle={European conference on computer vision (ECCV)},
  year={2022}
```
</details>


**BibTeX:**
```bibtex
@article{mobile_sam,
  title={Faster Segment Anything: Towards Lightweight SAM for Mobile Applications},
  author={Zhang, Chaoning and Han, Dongshen and Qiao, Yu and Kim, Jung Uk and Bae, Sung Ho and Lee, Seungkyu and Hong, Choong Seon},
  journal={arXiv preprint arXiv:2306.14289},
  year={2023}
}
```