Spaces:
Running
on
Zero
Chart-based Dense Pose Estimation for Humans and Animals
Overview
The goal of chart-based DensePose methods is to establish dense correspondences
between image pixels and 3D object mesh by splitting the latter into charts and estimating
for each pixel the corresponding chart index I
and local chart coordinates (U, V)
.
The charts used for human DensePose estimation are shown in Figure 1.
The human body is split into 24 parts, each part is parametrized by U
and V
coordinates, each taking values in [0, 1]
.
The pipeline uses Faster R-CNN
with Feature Pyramid Network meta architecture
outlined in Figure 2. For each detected object, the model predicts
its coarse segmentation S
(2 or 15 channels: foreground / background or
background + 14 predefined body parts), fine segmentation I
(25 channels:
background + 24 predefined body parts) and local chart coordinates U
and V
.
Bootstrapping Chart-Based Models
Sanakoyeu et al., 2020 introduced a pipeline to transfer DensePose models trained on humans to proximal animal classes (chimpanzees), which is summarized in Figure 3. The training proceeds in two stages:
First, a master model is trained on data from source domain (humans with full
DensePose annotation S
, I
, U
and V
)
and supporting domain (animals with segmentation annotation only).
Only selected animal classes are chosen from the supporting
domain through category filters to guarantee the quality of target domain results.
The training is done in class-agnostic manner: all selected categories are mapped
to a single category (human).
Second, a student model is trained on data from source and supporting domains, as well as data from target domain obtained by applying the master model, selecting high-confidence detections and sampling the results.
Examples of pretrained master and student models are available in the Model Zoo. For more details on the bootstrapping pipeline, please see Bootstrapping Pipeline.
Datasets
For more details on datasets used for chart-based model training and validation, please refer to the DensePose Datasets page.
Model Zoo and Baselines
Legacy Models
Baselines trained using schedules from GΓΌler et al, 2018
Name | lr sched |
train time (s/iter) |
inference time (s/im) |
train mem (GB) |
box AP |
segm AP |
dp. AP GPS |
dp. AP GPSm |
model id | download |
---|---|---|---|---|---|---|---|---|---|---|
R_50_FPN_s1x_legacy | s1x | 0.307 | 0.051 | 3.2 | 58.1 | 58.2 | 52.1 | 54.9 | 164832157 | model | metrics |
R_101_FPN_s1x_legacy | s1x | 0.390 | 0.063 | 4.3 | 59.5 | 59.3 | 53.2 | 56.0 | 164832182 | model | metrics |
Improved Baselines, Original Fully Convolutional Head
These models use an improved training schedule and Panoptic FPN head from Kirillov et al, 2019.
Name | lr sched |
train time (s/iter) |
inference time (s/im) |
train mem (GB) |
box AP |
segm AP |
dp. AP GPS |
dp. AP GPSm |
model id | download |
---|---|---|---|---|---|---|---|---|---|---|
R_50_FPN_s1x | s1x | 0.359 | 0.066 | 4.5 | 61.2 | 67.2 | 63.7 | 65.3 | 165712039 | model | metrics |
R_101_FPN_s1x | s1x | 0.428 | 0.079 | 5.8 | 62.3 | 67.8 | 64.5 | 66.2 | 165712084 | model | metrics |
Improved Baselines, DeepLabV3 Head
These models use an improved training schedule, Panoptic FPN head from Kirillov et al, 2019 and DeepLabV3 head from Chen et al, 2017.
Name | lr sched |
train time (s/iter) |
inference time (s/im) |
train mem (GB) |
box AP |
segm AP |
dp. AP GPS |
dp. AP GPSm |
model id | download |
---|---|---|---|---|---|---|---|---|---|---|
R_50_FPN_DL_s1x | s1x | 0.392 | 0.070 | 6.7 | 61.1 | 68.3 | 65.6 | 66.7 | 165712097 | model | metrics |
R_101_FPN_DL_s1x | s1x | 0.478 | 0.083 | 7.0 | 62.3 | 68.7 | 66.3 | 67.6 | 165712116 | model | metrics |
Baselines with Confidence Estimation
These models perform additional estimation of confidence in regressed UV coodrinates, along the lines of Neverova et al., 2019.
Name | lr sched |
train time (s/iter) |
inference time (s/im) |
train mem (GB) |
box AP |
segm AP |
dp. AP GPS |
dp. AP GPSm |
model id | download |
---|---|---|---|---|---|---|---|---|---|---|
R_50_FPN_WC1_s1x | s1x | 0.353 | 0.064 | 4.6 | 60.5 | 67.0 | 64.2 | 65.4 | 173862049 | model | metrics |
R_50_FPN_WC2_s1x | s1x | 0.364 | 0.066 | 4.8 | 60.7 | 66.9 | 64.2 | 65.7 | 173861455 | model | metrics |
R_50_FPN_DL_WC1_s1x | s1x | 0.397 | 0.068 | 6.7 | 61.1 | 68.1 | 65.8 | 67.0 | 173067973 | model | metrics |
R_50_FPN_DL_WC2_s1x | s1x | 0.410 | 0.070 | 6.8 | 60.8 | 67.9 | 65.6 | 66.7 | 173859335 | model | metrics |
R_101_FPN_WC1_s1x | s1x | 0.435 | 0.076 | 5.7 | 62.5 | 67.6 | 64.9 | 66.3 | 171402969 | model | metrics |
R_101_FPN_WC2_s1x | s1x | 0.450 | 0.078 | 5.7 | 62.3 | 67.6 | 64.8 | 66.4 | 173860702 | model | metrics |
R_101_FPN_DL_WC1_s1x | s1x | 0.479 | 0.081 | 7.9 | 62.0 | 68.4 | 66.2 | 67.2 | 173858525 | model | metrics |
R_101_FPN_DL_WC2_s1x | s1x | 0.491 | 0.082 | 7.6 | 61.7 | 68.3 | 65.9 | 67.2 | 173294801 | model | metrics |
Acronyms:
WC1
: with confidence estimation model type 1 for U
and V
WC2
: with confidence estimation model type 2 for U
and V
Baselines with Mask Confidence Estimation
Models that perform estimation of confidence in regressed UV coodrinates as well as confidences associated with coarse and fine segmentation, see Sanakoyeu et al., 2020 for details.
Name | lr sched |
train time (s/iter) |
inference time (s/im) |
train mem (GB) |
box AP |
segm AP |
dp. AP GPS |
dp. AP GPSm |
model id | download |
---|---|---|---|---|---|---|---|---|---|---|
R_50_FPN_WC1M_s1x | s1x | 0.381 | 0.066 | 4.8 | 60.6 | 66.7 | 64.0 | 65.4 | 217144516 | model | metrics |
R_50_FPN_WC2M_s1x | s1x | 0.342 | 0.068 | 5.0 | 60.7 | 66.9 | 64.2 | 65.5 | 216245640 | model | metrics |
R_50_FPN_DL_WC1M_s1x | s1x | 0.371 | 0.068 | 6.0 | 60.7 | 68.0 | 65.2 | 66.7 | 216245703 | model | metrics |
R_50_FPN_DL_WC2M_s1x | s1x | 0.385 | 0.071 | 6.1 | 60.8 | 68.1 | 65.0 | 66.4 | 216245758 | model | metrics |
R_101_FPN_WC1M_s1x | s1x | 0.423 | 0.079 | 5.9 | 62.0 | 67.3 | 64.8 | 66.0 | 216453687 | model | metrics |
R_101_FPN_WC2M_s1x | s1x | 0.436 | 0.080 | 5.9 | 62.5 | 67.4 | 64.5 | 66.0 | 216245682 | model | metrics |
R_101_FPN_DL_WC1M_s1x | s1x | 0.453 | 0.079 | 6.8 | 62.0 | 68.1 | 66.4 | 67.1 | 216245771 | model | metrics |
R_101_FPN_DL_WC2M_s1x | s1x | 0.464 | 0.080 | 6.9 | 61.9 | 68.2 | 66.1 | 67.1 | 216245790 | model | metrics |
Acronyms:
WC1M
: with confidence estimation model type 1 for U
and V
and mask confidence estimation
WC2M
: with confidence estimation model type 2 for U
and V
and mask confidence estimation
Bootstrapping Baselines
Master and student models trained using the bootstrapping pipeline with chimpanzee as the target category, see Sanakoyeu et al., 2020 and Bootstrapping Pipeline for details. Evaluation is performed on DensePose Chimps dataset.
Name | lr sched |
train time (s/iter) |
inference time (s/im) |
train mem (GB) |
box AP |
segm AP |
dp. APex GPS |
dp. AP GPS |
dp. AP GPSm |
model id | download |
---|---|---|---|---|---|---|---|---|---|---|---|
R_50_FPN_DL_WC1M_3x_Atop10P_CA | 3x | 0.522 | 0.073 | 9.7 | 61.3 | 59.1 | 36.2 | 20.0 | 30.2 | 217578784 | model | metrics |
R_50_FPN_DL_WC1M_3x_Atop10P_CA_B_uniform | 3x | 1.939 | 0.072 | 10.1 | 60.9 | 58.5 | 37.2 | 21.5 | 31.0 | 256453729 | model | metrics |
R_50_FPN_DL_WC1M_3x_Atop10P_CA_B_uv | 3x | 1.985 | 0.072 | 9.6 | 61.4 | 58.9 | 38.3 | 22.2 | 32.1 | 256452095 | model | metrics |
R_50_FPN_DL_WC1M_3x_Atop10P_CA_B_finesegm | 3x | 2.047 | 0.072 | 10.3 | 60.9 | 58.5 | 36.7 | 20.7 | 30.7 | 256452819 | model | metrics |
R_50_FPN_DL_WC1M_3x_Atop10P_CA_B_coarsesegm | 3x | 1.830 | 0.070 | 9.6 | 61.3 | 59.2 | 37.9 | 21.5 | 31.6 | 256455697 | model | metrics |
Acronyms:
WC1M
: with confidence estimation model type 1 for U
and V
and mask confidence estimation
Atop10P
: humans and animals from the 10 best suitable categories are used for training
CA
: class agnostic training, where all annotated instances are mapped into a single category
B_<...>
: schedule with bootstrapping with the specified results sampling strategy
Note:
The relaxed dp. APex GPS
metric was used in
Sanakoyeu et al., 2020 to evaluate DensePose
results. This metric considers matches at thresholds 0.2, 0.3 and 0.4 additionally
to the standard ones used in the evaluation protocol. The minimum threshold is
controlled by DENSEPOSE_EVALUATION.MIN_IOU_THRESHOLD
config option.
License
All models available for download are licensed under the Creative Commons Attribution-ShareAlike 3.0 license
References
If you use chart-based DensePose methods, please take the references from the following BibTeX entries:
DensePose bootstrapping pipeline:
@InProceedings{Sanakoyeu2020TransferringDensePose,
title = {Transferring Dense Pose to Proximal Animal Classes},
author = {Artsiom Sanakoyeu and Vasil Khalidov and Maureen S. McCarthy and Andrea Vedaldi and Natalia Neverova},
journal = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2020},
}
DensePose with confidence estimation:
@InProceedings{Neverova2019DensePoseConfidences,
title = {Correlated Uncertainty for Learning Dense Correspondences from Noisy Labels},
author = {Neverova, Natalia and Novotny, David and Vedaldi, Andrea},
journal = {Advances in Neural Information Processing Systems},
year = {2019},
}
Original DensePose:
@InProceedings{Guler2018DensePose,
title={DensePose: Dense Human Pose Estimation In The Wild},
author={R\{i}za Alp G\"uler, Natalia Neverova, Iasonas Kokkinos},
journal={The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2018}
}