nanosam / README.md
dragonSwing's picture
Update README.md
5f8531e
|
raw
history blame
7.67 kB
metadata
license: apache-2.0
pipeline_tag: mask-generation

NanoSAM: Accelerated Segment Anything Model for Edge deployment

Pretrained Models

NanoSAM performance on edge devices. Latency/throughput is measured on NVIDIA Jetson Xavier NX, and NVIDIA T4 GPU with TensorRT, fp16. Data transfer time is included.

Model † :stopwatch: CPU (ms) :stopwatch: Jetson Xavier NX (ms) :stopwatch: T4 (ms) Model Size Link
Image Encoder Full Pipeline Image Encoder Full Pipeline Image Encoder Full Pipeline
PPHGV2-SAM-B1 110ms 180ms 9.6ms 17ms 2.4ms 5.8ms 12.1MB Link
PPHGV2-SAM-B2 200ms 270ms 12.4ms 19.8ms 3.2ms 6.4ms 28.1MB Link
PPHGV2-SAM-B4 300ms 370ms 17.3ms 24.7ms 4.1ms 7.5ms 58.6MB Link
NanoSAM (ResNet18) 500ms 570ms 22.4ms 29.8ms 5.8ms 9.2ms 60.4MB Link
EfficientViT-SAM-L0 1s 1.07s 31.6ms 38ms 6ms 9.4ms 117.4MB

Zero-Shot Instance Segmentation on COCO2017 validation dataset

Image Encoder mAPmask
50-95
mIoU (all) mIoU (large) mIoU (medium) mIoU (small)
ResNet18 - 70.6 79.6 73.8 62.4
MobileSAM - 72.8 80.4 75.9 65.8
PPHGV2-B1 41.2 75.6 81.2 77.4 70.8
PPHGV2-B2 42.6 76.5 82.2 78.5 71.5
PPHGV2-B4 44.0 77.3 83.0 79.7 72.1
EfficientViT-L0 45.6 78.6 83.7 81.0 73.3

Usage

from nanosam.utils.predictor import Predictor

image_encoder_cfg = {
    "path": "data/sam_hgv2_b4_ln_nonorm_image_encoder.onnx",
    "name": "OnnxModel",
    "provider": "cpu",
    "normalize_input": False,
}
mask_decoder_cfg = {
    "path": "data/efficientvit_l0_mask_decoder.onnx",
    "name": "OnnxModel",
    "provider": "cpu",
}
predictor = Predictor(encoder_cfg, decoder_cfg)

image = PIL.Image.open("assets/dogs.jpg")

predictor.set_image(image)

mask, _, _ = predictor.predict(np.array([[x, y]]), np.array([1]))

The point labels may be

Point Label Description
0 Background point
1 Foreground point
2 Bounding box top-left
3 Bounding box bottom-right