dragonSwing commited on
Commit
5f8531e
·
1 Parent(s): 6955aa3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +133 -0
README.md CHANGED
@@ -1,3 +1,136 @@
1
  ---
2
  license: apache-2.0
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ pipeline_tag: mask-generation
4
  ---
5
+
6
+ # NanoSAM: Accelerated Segment Anything Model for Edge deployment
7
+
8
+ - [GitHub](https://github.com/binh234/nanosam)
9
+ - [Demo](https://huggingface.co/spaces/dragonSwing/nanosam)
10
+
11
+ ## Pretrained Models
12
+
13
+ NanoSAM performance on edge devices. Latency/throughput is measured on NVIDIA Jetson Xavier NX, and NVIDIA T4 GPU with TensorRT, fp16. Data transfer time is included.
14
+
15
+ <table style="border-top: solid 1px; border-left: solid 1px; border-right: solid 1px; border-bottom: solid 1px">
16
+ <thead>
17
+ <tr>
18
+ <th rowspan=2 style="text-align: center; border-right: solid 1px">Model †</th>
19
+ <th colspan=2 style="text-align: center; border-right: solid 1px">:stopwatch: CPU (ms)</th>
20
+ <th colspan=2 style="text-align: center; border-right: solid 1px">:stopwatch: Jetson Xavier NX (ms)</th>
21
+ <th colspan=2 style="text-align: center; border-right: solid 1px">:stopwatch: T4 (ms)</th>
22
+ <th rowspan=2 style="text-align: center; border-right: solid 1px">Model Size</th>
23
+ <th rowspan=2 style="text-align: center; border-right: solid 1px">Link</th>
24
+ </tr>
25
+ <tr>
26
+ <th style="text-align: center; border-right: solid 1px">Image Encoder</th>
27
+ <th style="text-align: center; border-right: solid 1px">Full Pipeline</th>
28
+ <th style="text-align: center; border-right: solid 1px">Image Encoder</th>
29
+ <th style="text-align: center; border-right: solid 1px">Full Pipeline</th>
30
+ <th style="text-align: center; border-right: solid 1px">Image Encoder</th>
31
+ <th style="text-align: center; border-right: solid 1px">Full Pipeline</th>
32
+ </tr>
33
+ </thead>
34
+ <tbody>
35
+ <tr>
36
+ <td style="text-align: center; border-right: solid 1px">PPHGV2-SAM-B1</td>
37
+ <td style="text-align: center; border-right: solid 1px">110ms</td>
38
+ <td style="text-align: center; border-right: solid 1px">180ms</td>
39
+ <td style="text-align: center; border-right: solid 1px">9.6ms</td>
40
+ <td style="text-align: center; border-right: solid 1px">17ms</td>
41
+ <td style="text-align: center; border-right: solid 1px">2.4ms</td>
42
+ <td style="text-align: center; border-right: solid 1px">5.8ms</td>
43
+ <td style="text-align: center; border-right: solid 1px">12.1MB</td>
44
+ <td style="text-align: center; border-right: solid 1px"><a href="https://huggingface.co/dragonSwing/nanosam/resolve/main/sam_hgv2_b1_ln_nonorm_image_encoder.onnx">Link</a></td>
45
+ </tr>
46
+ <tr>
47
+ <td style="text-align: center; border-right: solid 1px">PPHGV2-SAM-B2</td>
48
+ <td style="text-align: center; border-right: solid 1px">200ms</td>
49
+ <td style="text-align: center; border-right: solid 1px">270ms</td>
50
+ <td style="text-align: center; border-right: solid 1px">12.4ms</td>
51
+ <td style="text-align: center; border-right: solid 1px">19.8ms</td>
52
+ <td style="text-align: center; border-right: solid 1px">3.2ms</td>
53
+ <td style="text-align: center; border-right: solid 1px">6.4ms</td>
54
+ <td style="text-align: center; border-right: solid 1px">28.1MB</td>
55
+ <td style="text-align: center; border-right: solid 1px"><a href="https://huggingface.co/dragonSwing/nanosam/resolve/main/sam_hgv2_b4_ln_nonorm_image_encoder.onnx">Link</a></td>
56
+ </tr>
57
+ <tr>
58
+ <td style="text-align: center; border-right: solid 1px">PPHGV2-SAM-B4</td>
59
+ <td style="text-align: center; border-right: solid 1px">300ms</td>
60
+ <td style="text-align: center; border-right: solid 1px">370ms</td>
61
+ <td style="text-align: center; border-right: solid 1px">17.3ms</td>
62
+ <td style="text-align: center; border-right: solid 1px">24.7ms</td>
63
+ <td style="text-align: center; border-right: solid 1px">4.1ms</td>
64
+ <td style="text-align: center; border-right: solid 1px">7.5ms</td>
65
+ <td style="text-align: center; border-right: solid 1px">58.6MB</td>
66
+ <td style="text-align: center; border-right: solid 1px"><a href="https://huggingface.co/dragonSwing/nanosam/resolve/main/sam_hgv2_b4_ln_nonorm_image_encoder.onnx">Link</a></td>
67
+ </tr>
68
+ <tr>
69
+ <td style="text-align: center; border-right: solid 1px">NanoSAM (ResNet18)</td>
70
+ <td style="text-align: center; border-right: solid 1px">500ms</td>
71
+ <td style="text-align: center; border-right: solid 1px">570ms</td>
72
+ <td style="text-align: center; border-right: solid 1px">22.4ms</td>
73
+ <td style="text-align: center; border-right: solid 1px">29.8ms</td>
74
+ <td style="text-align: center; border-right: solid 1px">5.8ms</td>
75
+ <td style="text-align: center; border-right: solid 1px">9.2ms</td>
76
+ <td style="text-align: center; border-right: solid 1px">60.4MB</td>
77
+ <td style="text-align: center; border-right: solid 1px"><a href="https://drive.google.com/file/d/14-SsvoaTl-esC3JOzomHDnI9OGgdO2OR/view?usp=drive_link">Link</a></td>
78
+ </tr>
79
+ <tr>
80
+ <td style="text-align: center; border-right: solid 1px">EfficientViT-SAM-L0</td>
81
+ <td style="text-align: center; border-right: solid 1px">1s</td>
82
+ <td style="text-align: center; border-right: solid 1px">1.07s</td>
83
+ <td style="text-align: center; border-right: solid 1px">31.6ms</td>
84
+ <td style="text-align: center; border-right: solid 1px">38ms</td>
85
+ <td style="text-align: center; border-right: solid 1px">6ms</td>
86
+ <td style="text-align: center; border-right: solid 1px">9.4ms</td>
87
+ <td style="text-align: center; border-right: solid 1px">117.4MB</td>
88
+ <td style="text-align: center; border-right: solid 1px"></td>
89
+ </tr>
90
+ </tbody>
91
+ </table>
92
+
93
+ Zero-Shot Instance Segmentation on COCO2017 validation dataset
94
+
95
+ | Image Encoder | mAP<sup>mask<br>50-95 | mIoU (all) | mIoU (large) | mIoU (medium) | mIoU (small) |
96
+ | --------------- | :-------------------: | :--------: | :----------: | :-----------: | :----------: |
97
+ | ResNet18 | - | 70.6 | 79.6 | 73.8 | 62.4 |
98
+ | MobileSAM | - | 72.8 | 80.4 | 75.9 | 65.8 |
99
+ | PPHGV2-B1 | 41.2 | 75.6 | 81.2 | 77.4 | 70.8 |
100
+ | PPHGV2-B2 | 42.6 | 76.5 | 82.2 | 78.5 | 71.5 |
101
+ | PPHGV2-B4 | 44.0 | 77.3 | 83.0 | 79.7 | 72.1 |
102
+ | EfficientViT-L0 | 45.6 | 78.6 | 83.7 | 81.0 | 73.3 |
103
+
104
+ ## Usage
105
+
106
+ ```python3
107
+ from nanosam.utils.predictor import Predictor
108
+
109
+ image_encoder_cfg = {
110
+ "path": "data/sam_hgv2_b4_ln_nonorm_image_encoder.onnx",
111
+ "name": "OnnxModel",
112
+ "provider": "cpu",
113
+ "normalize_input": False,
114
+ }
115
+ mask_decoder_cfg = {
116
+ "path": "data/efficientvit_l0_mask_decoder.onnx",
117
+ "name": "OnnxModel",
118
+ "provider": "cpu",
119
+ }
120
+ predictor = Predictor(encoder_cfg, decoder_cfg)
121
+
122
+ image = PIL.Image.open("assets/dogs.jpg")
123
+
124
+ predictor.set_image(image)
125
+
126
+ mask, _, _ = predictor.predict(np.array([[x, y]]), np.array([1]))
127
+ ```
128
+
129
+ The point labels may be
130
+
131
+ | Point Label | Description |
132
+ | :---------: | ------------------------- |
133
+ | 0 | Background point |
134
+ | 1 | Foreground point |
135
+ | 2 | Bounding box top-left |
136
+ | 3 | Bounding box bottom-right |