nanosam / README.md
dragonSwing's picture
Update README.md
37fcd54
|
raw
history blame
3.38 kB
metadata
license: apache-2.0
pipeline_tag: mask-generation

NanoSAM: Accelerated Segment Anything Model for Edge deployment

Pretrained Models

NanoSAM performance on edge devices. Latency/throughput is measured on NVIDIA Jetson Xavier NX, and NVIDIA T4 GPU with TensorRT, fp16. Data transfer time is included.

Image Encoder CPU Jetson Xavier NX T4 Model size Download
PPHGV2-B1 110ms 9.6ms 2.4ms 12.7MB Link
PPHGV2-B2 200ms 12.4ms 3.2ms 29.5MB Link
PPHGV2-B4 300ms 17.3ms 4.1ms 61.4MB Link
ResNet18 500ms 22.4ms 5.8ms 63.2MB Link
EfficientViT-L0 1s 31.6ms 6ms 117.5MB -

Zero-Shot Instance Segmentation on COCO2017 validation dataset

Image Encoder mAPmask
50-95
mIoU (all) mIoU (large) mIoU (medium) mIoU (small)
ResNet18 - 70.6 79.6 73.8 62.4
MobileSAM - 72.8 80.4 75.9 65.8
PPHGV2-B1 41.2 75.6 81.2 77.4 70.8
PPHGV2-B2 42.6 76.5 82.2 78.5 71.5
PPHGV2-B4 44.0 77.3 83.0 79.7 72.1
EfficientViT-L0 45.6 78.6 83.7 81.0 73.3

Usage

from nanosam.utils.predictor import Predictor

image_encoder_cfg = {
    "path": "data/sam_hgv2_b4_ln_nonorm_image_encoder.onnx",
    "name": "OnnxModel",
    "provider": "cpu",
    "normalize_input": False,
}
mask_decoder_cfg = {
    "path": "data/efficientvit_l0_mask_decoder.onnx",
    "name": "OnnxModel",
    "provider": "cpu",
}
predictor = Predictor(encoder_cfg, decoder_cfg)

image = PIL.Image.open("assets/dogs.jpg")

predictor.set_image(image)

mask, _, _ = predictor.predict(np.array([[x, y]]), np.array([1]))

The point labels may be

Point Label Description
0 Background point
1 Foreground point
2 Bounding box top-left
3 Bounding box bottom-right