DeepLabV3+ ResNet50 for human body parts segmentation

This is a very simple ONNX model that can segment human body parts.

Why this model

This model is a ONNX transposition of keras-io/deeplabv3p-resnet50 where the provided model can segment human body parts. All the others models that I found was trained on city segmentation.

The original model is built for old version of Keras and cannot be used with recent version of TensorFlow. I translated the model to ONNX format.

Usage

Get the deeplabv3p-resnet50-human.onnx file and use it with ONNXRuntime package.

The result of model.run is a (1, 1, 512, 512, 20) tensor:

1: number of output (you can squeeze it)
1: batch size (you can squeeze it)
512, 512: the size of the image (fixed)
20: number of classes, so you can take the `argmax`` of the tensor to get the class of each pixel

import onnxruntime
import numpy as np
from PIL import Image

model = onnxruntime.InferenceSession("deeplabv3p-resnet50-human.onnx")

img = Image.open(sys.argv[1] if len(sys.argv) > 1 else "image.jpg")
img = img.resize((512, 512))
img = np.array(img).astype(np.float32) / 127.5 - 1

# infer
input_name = model.get_inputs()[0].name
output_name = model.get_outputs()[0].name
result = model.run([output_name], {input_name: img})

# squeeze, argmax...
result = np.array(result[0])
# argmax the classes, remove the batch size
result = result.argmax(axis=3).squeeze(0)

# get the masks
for i in range(20):
    detected = result == i # get the detected pixels for the class i
    # detected  is a 512, 512 boolean array
    mask = np.zeros_like(img)
    mask[detected] = 255
    Image.fromarray(mask).show() # or save, or return the mask...

Classes index

This is the list of classes that the model can detect (some classes are not specifically identified, see below):

0: "background",
1: "unknown",
2: "hair",
3: "unknown",
4: "glasses",
5: "top-clothes",
6: "unknown",
7: "unknown",
8: "unknown",
9: "bottom-clothes",
10: "torso-skin",
11: "unknown",
12: "unknown",
13: "face",
14: "left-arm",
15: "right-arm",
16: "left-leg",
17: "right-leg",
18: "left-foot",
19: "right-foot",

Known limitation

The model could fail on portrait images, because the model was trained on "full body" images.
There are some classes that I don't know what they are. I can't find the list of classes (help !).
The model is not perfect, and can fail on some images. I'm not the author of the model, so I can't fix it.

License

The original model card proposes the "CC0-1.0" license. I don't know if it's the right license for the model, but I keep it.

Anyway, thanks to the authors of the model for sharing it and to leave it open to use.

This means that you may use the model, share, modify, and distribute it without any restriction.