Chart-based Dense Pose Estimation for Humans and Animals

Overview

The goal of chart-based DensePose methods is to establish dense correspondences between image pixels and 3D object mesh by splitting the latter into charts and estimating for each pixel the corresponding chart index I and local chart coordinates (U, V).

The charts used for human DensePose estimation are shown in Figure 1. The human body is split into 24 parts, each part is parametrized by U and V coordinates, each taking values in [0, 1].

Figure 1. Partitioning and parametrization of human body surface.

The pipeline uses Faster R-CNN with Feature Pyramid Network meta architecture outlined in Figure 2. For each detected object, the model predicts its coarse segmentation S (2 or 15 channels: foreground / background or background + 14 predefined body parts), fine segmentation I (25 channels: background + 24 predefined body parts) and local chart coordinates U and V.

Figure 2. DensePose chart-based architecture based on Faster R-CNN with Feature Pyramid Network (FPN).

Bootstrapping Chart-Based Models

Sanakoyeu et al., 2020 introduced a pipeline to transfer DensePose models trained on humans to proximal animal classes (chimpanzees), which is summarized in Figure 3. The training proceeds in two stages:

First, a master model is trained on data from source domain (humans with full DensePose annotation S, I, U and V) and supporting domain (animals with segmentation annotation only). Only selected animal classes are chosen from the supporting domain through category filters to guarantee the quality of target domain results. The training is done in class-agnostic manner: all selected categories are mapped to a single category (human).

Second, a student model is trained on data from source and supporting domains, as well as data from target domain obtained by applying the master model, selecting high-confidence detections and sampling the results.

Figure 3. Domain adaptation: master model is trained on data from source and supporting domains to produce predictions in target domain; student model combines data from source and supporting domains, as well as sampled predictions from the master model on target domain to improve target domain predictions quality.

Examples of pretrained master and student models are available in the Model Zoo. For more details on the bootstrapping pipeline, please see Bootstrapping Pipeline.

Datasets

For more details on datasets used for chart-based model training and validation, please refer to the DensePose Datasets page.

Model Zoo and Baselines

Legacy Models

Baselines trained using schedules from Güler et al, 2018

Name	lr sched	train time (s/iter)	inference time (s/im)	train mem (GB)	box AP	segm AP	dp. AP GPS	dp. AP GPSm	model id	download
R_50_FPN_s1x_legacy	s1x	0.307	0.051	3.2	58.1	58.2	52.1	54.9	164832157	model \| metrics
R_101_FPN_s1x_legacy	s1x	0.390	0.063	4.3	59.5	59.3	53.2	56.0	164832182	model \| metrics

Improved Baselines, Original Fully Convolutional Head

These models use an improved training schedule and Panoptic FPN head from Kirillov et al, 2019.

Name	lr sched	train time (s/iter)	inference time (s/im)	train mem (GB)	box AP	segm AP	dp. AP GPS	dp. AP GPSm	model id	download
R_50_FPN_s1x	s1x	0.359	0.066	4.5	61.2	67.2	63.7	65.3	165712039	model \| metrics
R_101_FPN_s1x	s1x	0.428	0.079	5.8	62.3	67.8	64.5	66.2	165712084	model \| metrics

Improved Baselines, DeepLabV3 Head

These models use an improved training schedule, Panoptic FPN head from Kirillov et al, 2019 and DeepLabV3 head from Chen et al, 2017.

Name	lr sched	train time (s/iter)	inference time (s/im)	train mem (GB)	box AP	segm AP	dp. AP GPS	dp. AP GPSm	model id	download
R_50_FPN_DL_s1x	s1x	0.392	0.070	6.7	61.1	68.3	65.6	66.7	165712097	model \| metrics
R_101_FPN_DL_s1x	s1x	0.478	0.083	7.0	62.3	68.7	66.3	67.6	165712116	model \| metrics

Baselines with Confidence Estimation

These models perform additional estimation of confidence in regressed UV coodrinates, along the lines of Neverova et al., 2019.

Name	lr sched	train time (s/iter)	inference time (s/im)	train mem (GB)	box AP	segm AP	dp. AP GPS	dp. AP GPSm	model id	download
R_50_FPN_WC1_s1x	s1x	0.353	0.064	4.6	60.5	67.0	64.2	65.4	173862049	model \| metrics
R_50_FPN_WC2_s1x	s1x	0.364	0.066	4.8	60.7	66.9	64.2	65.7	173861455	model \| metrics
R_50_FPN_DL_WC1_s1x	s1x	0.397	0.068	6.7	61.1	68.1	65.8	67.0	173067973	model \| metrics
R_50_FPN_DL_WC2_s1x	s1x	0.410	0.070	6.8	60.8	67.9	65.6	66.7	173859335	model \| metrics
R_101_FPN_WC1_s1x	s1x	0.435	0.076	5.7	62.5	67.6	64.9	66.3	171402969	model \| metrics
R_101_FPN_WC2_s1x	s1x	0.450	0.078	5.7	62.3	67.6	64.8	66.4	173860702	model \| metrics
R_101_FPN_DL_WC1_s1x	s1x	0.479	0.081	7.9	62.0	68.4	66.2	67.2	173858525	model \| metrics
R_101_FPN_DL_WC2_s1x	s1x	0.491	0.082	7.6	61.7	68.3	65.9	67.2	173294801	model \| metrics

Acronyms:

WC1: with confidence estimation model type 1 for U and V

WC2: with confidence estimation model type 2 for U and V

Baselines with Mask Confidence Estimation

Models that perform estimation of confidence in regressed UV coodrinates as well as confidences associated with coarse and fine segmentation, see Sanakoyeu et al., 2020 for details.

Name	lr sched	train time (s/iter)	inference time (s/im)	train mem (GB)	box AP	segm AP	dp. AP GPS	dp. AP GPSm	model id	download
R_50_FPN_WC1M_s1x	s1x	0.381	0.066	4.8	60.6	66.7	64.0	65.4	217144516	model \| metrics
R_50_FPN_WC2M_s1x	s1x	0.342	0.068	5.0	60.7	66.9	64.2	65.5	216245640	model \| metrics
R_50_FPN_DL_WC1M_s1x	s1x	0.371	0.068	6.0	60.7	68.0	65.2	66.7	216245703	model \| metrics
R_50_FPN_DL_WC2M_s1x	s1x	0.385	0.071	6.1	60.8	68.1	65.0	66.4	216245758	model \| metrics
R_101_FPN_WC1M_s1x	s1x	0.423	0.079	5.9	62.0	67.3	64.8	66.0	216453687	model \| metrics
R_101_FPN_WC2M_s1x	s1x	0.436	0.080	5.9	62.5	67.4	64.5	66.0	216245682	model \| metrics
R_101_FPN_DL_WC1M_s1x	s1x	0.453	0.079	6.8	62.0	68.1	66.4	67.1	216245771	model \| metrics
R_101_FPN_DL_WC2M_s1x	s1x	0.464	0.080	6.9	61.9	68.2	66.1	67.1	216245790	model \| metrics

Acronyms:

WC1M: with confidence estimation model type 1 for U and V and mask confidence estimation

WC2M: with confidence estimation model type 2 for U and V and mask confidence estimation

Bootstrapping Baselines

Master and student models trained using the bootstrapping pipeline with chimpanzee as the target category, see Sanakoyeu et al., 2020 and Bootstrapping Pipeline for details. Evaluation is performed on DensePose Chimps dataset.

Name	lr sched	train time (s/iter)	inference time (s/im)	train mem (GB)	box AP	segm AP	dp. APex GPS	dp. AP GPS	dp. AP GPSm	model id	download
R_50_FPN_DL_WC1M_3x_Atop10P_CA	3x	0.522	0.073	9.7	61.3	59.1	36.2	20.0	30.2	217578784	model \| metrics
R_50_FPN_DL_WC1M_3x_Atop10P_CA_B_uniform	3x	1.939	0.072	10.1	60.9	58.5	37.2	21.5	31.0	256453729	model \| metrics
R_50_FPN_DL_WC1M_3x_Atop10P_CA_B_uv	3x	1.985	0.072	9.6	61.4	58.9	38.3	22.2	32.1	256452095	model \| metrics
R_50_FPN_DL_WC1M_3x_Atop10P_CA_B_finesegm	3x	2.047	0.072	10.3	60.9	58.5	36.7	20.7	30.7	256452819	model \| metrics
R_50_FPN_DL_WC1M_3x_Atop10P_CA_B_coarsesegm	3x	1.830	0.070	9.6	61.3	59.2	37.9	21.5	31.6	256455697	model \| metrics

Acronyms:

WC1M: with confidence estimation model type 1 for U and V and mask confidence estimation

Atop10P: humans and animals from the 10 best suitable categories are used for training

CA: class agnostic training, where all annotated instances are mapped into a single category

B_<...>: schedule with bootstrapping with the specified results sampling strategy

Note:

The relaxed dp. APex GPS metric was used in Sanakoyeu et al., 2020 to evaluate DensePose results. This metric considers matches at thresholds 0.2, 0.3 and 0.4 additionally to the standard ones used in the evaluation protocol. The minimum threshold is controlled by DENSEPOSE_EVALUATION.MIN_IOU_THRESHOLD config option.

License

All models available for download are licensed under the Creative Commons Attribution-ShareAlike 3.0 license

References

If you use chart-based DensePose methods, please take the references from the following BibTeX entries:

DensePose bootstrapping pipeline:

@InProceedings{Sanakoyeu2020TransferringDensePose,
    title = {Transferring Dense Pose to Proximal Animal Classes},
    author = {Artsiom Sanakoyeu and Vasil Khalidov and Maureen S. McCarthy and Andrea Vedaldi and Natalia Neverova},
    journal = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    year = {2020},
}

DensePose with confidence estimation:

@InProceedings{Neverova2019DensePoseConfidences,
    title = {Correlated Uncertainty for Learning Dense Correspondences from Noisy Labels},
    author = {Neverova, Natalia and Novotny, David and Vedaldi, Andrea},
    journal = {Advances in Neural Information Processing Systems},
    year = {2019},
}

Original DensePose:

@InProceedings{Guler2018DensePose,
  title={DensePose: Dense Human Pose Estimation In The Wild},
  author={R\{i}za Alp G\"uler, Natalia Neverova, Iasonas Kokkinos},
  journal={The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2018}
}