|
--- |
|
license: cc0-1.0 |
|
tags: |
|
- art |
|
- computer vision |
|
- Image segmentation |
|
--- |
|
|
|
# DeepLabV3+ ResNet50 for human body parts segmentation |
|
|
|
This is a very simple ONNX model that can segment human body parts. |
|
|
|
## Why this model |
|
|
|
This model is a ONNX transposition of [keras-io/deeplabv3p-resnet50](https://huggingface.co/keras-io/deeplabv3p-resnet50) |
|
where the provided model can segment human body parts. All the others models that I found was trained on |
|
city segmentation. |
|
|
|
The original model is built for old version of Keras and cannot be used with recent version of TensorFlow. |
|
I translated the model to ONNX format. |
|
|
|
## Usage |
|
|
|
Get the `deeplabv3p-resnet50-human.onnx` file and use it with ONNXRuntime package. |
|
|
|
The result of `model.run` is a `(1, 1, 512, 512, 20)` tensor: |
|
|
|
- 1: number of output (you can squeeze it) |
|
- 1: batch size (you can squeeze it) |
|
- 512, 512: the size of the image (fixed) |
|
- 20: number of classes, so you can take the `argmax`` of the tensor to get the class of each pixel |
|
|
|
```python |
|
import onnxruntime |
|
import numpy as np |
|
from PIL import Image |
|
|
|
model = onnxruntime.InferenceSession("deeplabv3p-resnet50-human.onnx") |
|
|
|
img = Image.open(sys.argv[1] if len(sys.argv) > 1 else "image.jpg") |
|
img = img.resize((512, 512)) |
|
img = np.array(img).astype(np.float32) / 127.5 - 1 |
|
|
|
# infer |
|
input_name = model.get_inputs()[0].name |
|
output_name = model.get_outputs()[0].name |
|
result = model.run([output_name], {input_name: img}) |
|
|
|
# squeeze, argmax... |
|
result = np.array(result[0]) |
|
# argmax the classes, remove the batch size |
|
result = result.argmax(axis=3).squeeze(0) |
|
|
|
# get the masks |
|
for i in range(20): |
|
detected = result == i # get the detected pixels for the class i |
|
# detected is a 512, 512 boolean array |
|
mask = np.zeros_like(img) |
|
mask[detected] = 255 |
|
Image.fromarray(mask).show() # or save, or return the mask... |
|
``` |
|
|
|
## Classes index |
|
|
|
This is the list of classes that the model can detect (some classes are not specifically identified, see below): |
|
|
|
- 0: "background", |
|
- 1: "unknown", |
|
- 2: "hair", |
|
- 3: "unknown", |
|
- 4: "glasses", |
|
- 5: "top-clothes", |
|
- 6: "unknown", |
|
- 7: "unknown", |
|
- 8: "unknown", |
|
- 9: "bottom-clothes", |
|
- 10: "torso-skin", |
|
- 11: "unknown", |
|
- 12: "unknown", |
|
- 13: "face", |
|
- 14: "left-arm", |
|
- 15: "right-arm", |
|
- 16: "left-leg", |
|
- 17: "right-leg", |
|
- 18: "left-foot", |
|
- 19: "right-foot", |
|
|
|
## Known limitation |
|
|
|
- The model could fail on portrait images, because the model was trained on "full body" images. |
|
- There are some classes that I don't know what they are. I can't find the list of classes (help !). |
|
- The model is not perfect, and can fail on some images. I'm not the author of the model, so I can't fix it. |
|
|
|
## License |
|
|
|
The [original model card](https://huggingface.co/keras-io/deeplabv3p-resnet50/blob/main/README.md) proposes the "CC0-1.0" |
|
license. I don't know if it's the right license for the model, but I keep it. |
|
|
|
> Anyway, thanks to the authors of the model for sharing it and to leave it open to use. |
|
|
|
This means that you may use the model, share, modify, and distribute it without any restriction. |