# Chart-based Dense Pose Estimation for Humans and Animals ## Overview The goal of chart-based DensePose methods is to establish dense correspondences between image pixels and 3D object mesh by splitting the latter into charts and estimating for each pixel the corresponding chart index `I` and local chart coordinates `(U, V)`.
The charts used for human DensePose estimation are shown in Figure 1. The human body is split into 24 parts, each part is parametrized by `U` and `V` coordinates, each taking values in `[0, 1]`.

Figure 1. Partitioning and parametrization of human body surface.

The pipeline uses [Faster R-CNN](https://arxiv.org/abs/1506.01497) with [Feature Pyramid Network](https://arxiv.org/abs/1612.03144) meta architecture outlined in Figure 2. For each detected object, the model predicts its coarse segmentation `S` (2 or 15 channels: foreground / background or background + 14 predefined body parts), fine segmentation `I` (25 channels: background + 24 predefined body parts) and local chart coordinates `U` and `V`.

Figure 2. DensePose chart-based architecture based on Faster R-CNN with Feature Pyramid Network (FPN).

### Bootstrapping Chart-Based Models [Sanakoyeu et al., 2020](https://arxiv.org/pdf/2003.00080.pdf) introduced a pipeline to transfer DensePose models trained on humans to proximal animal classes (chimpanzees), which is summarized in Figure 3. The training proceeds in two stages: First, a *master* model is trained on data from source domain (humans with full DensePose annotation `S`, `I`, `U` and `V`) and supporting domain (animals with segmentation annotation only). Only selected animal classes are chosen from the supporting domain through *category filters* to guarantee the quality of target domain results. The training is done in *class-agnostic manner*: all selected categories are mapped to a single category (human). Second, a *student* model is trained on data from source and supporting domains, as well as data from target domain obtained by applying the master model, selecting high-confidence detections and sampling the results.

Figure 3. Domain adaptation: master model is trained on data from source and supporting domains to produce predictions in target domain; student model combines data from source and supporting domains, as well as sampled predictions from the master model on target domain to improve target domain predictions quality.

Examples of pretrained master and student models are available in the [Model Zoo](#ModelZooBootstrap). For more details on the bootstrapping pipeline, please see [Bootstrapping Pipeline](BOOTSTRAPPING_PIPELINE.md). ### Datasets For more details on datasets used for chart-based model training and validation, please refer to the [DensePose Datasets](DENSEPOSE_DATASETS.md) page. ## Model Zoo and Baselines ### Legacy Models Baselines trained using schedules from [Güler et al, 2018](https://arxiv.org/pdf/1802.00434.pdf)
Name lr
sched
train
time
(s/iter)
inference
time
(s/im)
train
mem
(GB)
box
AP
segm
AP
dp. AP
GPS
dp. AP
GPSm
model id download
R_50_FPN_s1x_legacy s1x 0.307 0.051 3.2 58.1 58.2 52.1 54.9 164832157 model | metrics
R_101_FPN_s1x_legacy s1x 0.390 0.063 4.3 59.5 59.3 53.2 56.0 164832182 model | metrics
### Improved Baselines, Original Fully Convolutional Head These models use an improved training schedule and Panoptic FPN head from [Kirillov et al, 2019](https://arxiv.org/abs/1901.02446).
Name lr
sched
train
time
(s/iter)
inference
time
(s/im)
train
mem
(GB)
box
AP
segm
AP
dp. AP
GPS
dp. AP
GPSm
model id download
R_50_FPN_s1x s1x 0.359 0.066 4.5 61.2 67.2 63.7 65.3 165712039 model | metrics
R_101_FPN_s1x s1x 0.428 0.079 5.8 62.3 67.8 64.5 66.2 165712084 model | metrics
### Improved Baselines, DeepLabV3 Head These models use an improved training schedule, Panoptic FPN head from [Kirillov et al, 2019](https://arxiv.org/abs/1901.02446) and DeepLabV3 head from [Chen et al, 2017](https://arxiv.org/abs/1706.05587).
Name lr
sched
train
time
(s/iter)
inference
time
(s/im)
train
mem
(GB)
box
AP
segm
AP
dp. AP
GPS
dp. AP
GPSm
model id download
R_50_FPN_DL_s1x s1x 0.392 0.070 6.7 61.1 68.3 65.6 66.7 165712097 model | metrics
R_101_FPN_DL_s1x s1x 0.478 0.083 7.0 62.3 68.7 66.3 67.6 165712116 model | metrics
###
Baselines with Confidence Estimation These models perform additional estimation of confidence in regressed UV coodrinates, along the lines of [Neverova et al., 2019](https://papers.nips.cc/paper/8378-correlated-uncertainty-for-learning-dense-correspondences-from-noisy-labels).
Name lr
sched
train
time
(s/iter)
inference
time
(s/im)
train
mem
(GB)
box
AP
segm
AP
dp. AP
GPS
dp. AP
GPSm
model id download
R_50_FPN_WC1_s1x s1x 0.353 0.064 4.6 60.5 67.0 64.2 65.4 173862049 model | metrics
R_50_FPN_WC2_s1x s1x 0.364 0.066 4.8 60.7 66.9 64.2 65.7 173861455 model | metrics
R_50_FPN_DL_WC1_s1x s1x 0.397 0.068 6.7 61.1 68.1 65.8 67.0 173067973 model | metrics
R_50_FPN_DL_WC2_s1x s1x 0.410 0.070 6.8 60.8 67.9 65.6 66.7 173859335 model | metrics
R_101_FPN_WC1_s1x s1x 0.435 0.076 5.7 62.5 67.6 64.9 66.3 171402969 model | metrics
R_101_FPN_WC2_s1x s1x 0.450 0.078 5.7 62.3 67.6 64.8 66.4 173860702 model | metrics
R_101_FPN_DL_WC1_s1x s1x 0.479 0.081 7.9 62.0 68.4 66.2 67.2 173858525 model | metrics
R_101_FPN_DL_WC2_s1x s1x 0.491 0.082 7.6 61.7 68.3 65.9 67.2 173294801 model | metrics
Acronyms: `WC1`: with confidence estimation model type 1 for `U` and `V` `WC2`: with confidence estimation model type 2 for `U` and `V` ###
Baselines with Mask Confidence Estimation Models that perform estimation of confidence in regressed UV coodrinates as well as confidences associated with coarse and fine segmentation, see [Sanakoyeu et al., 2020](https://arxiv.org/pdf/2003.00080.pdf) for details.
Name lr
sched
train
time
(s/iter)
inference
time
(s/im)
train
mem
(GB)
box
AP
segm
AP
dp. AP
GPS
dp. AP
GPSm
model id download
R_50_FPN_WC1M_s1x s1x 0.381 0.066 4.8 60.6 66.7 64.0 65.4 217144516 model | metrics
R_50_FPN_WC2M_s1x s1x 0.342 0.068 5.0 60.7 66.9 64.2 65.5 216245640 model | metrics
R_50_FPN_DL_WC1M_s1x s1x 0.371 0.068 6.0 60.7 68.0 65.2 66.7 216245703 model | metrics
R_50_FPN_DL_WC2M_s1x s1x 0.385 0.071 6.1 60.8 68.1 65.0 66.4 216245758 model | metrics
R_101_FPN_WC1M_s1x s1x 0.423 0.079 5.9 62.0 67.3 64.8 66.0 216453687 model | metrics
R_101_FPN_WC2M_s1x s1x 0.436 0.080 5.9 62.5 67.4 64.5 66.0 216245682 model | metrics
R_101_FPN_DL_WC1M_s1x s1x 0.453 0.079 6.8 62.0 68.1 66.4 67.1 216245771 model | metrics
R_101_FPN_DL_WC2M_s1x s1x 0.464 0.080 6.9 61.9 68.2 66.1 67.1 216245790 model | metrics
Acronyms: `WC1M`: with confidence estimation model type 1 for `U` and `V` and mask confidence estimation `WC2M`: with confidence estimation model type 2 for `U` and `V` and mask confidence estimation ###
Bootstrapping Baselines Master and student models trained using the bootstrapping pipeline with chimpanzee as the target category, see [Sanakoyeu et al., 2020](https://arxiv.org/pdf/2003.00080.pdf) and [Bootstrapping Pipeline](BOOTSTRAPPING_PIPELINE.md) for details. Evaluation is performed on [DensePose Chimps](DENSEPOSE_DATASETS.md#densepose-chimps) dataset.
Name lr
sched
train
time
(s/iter)
inference
time
(s/im)
train
mem
(GB)
box
AP
segm
AP
dp. APex
GPS
dp. AP
GPS
dp. AP
GPSm
model id download
R_50_FPN_DL_WC1M_3x_Atop10P_CA 3x 0.522 0.073 9.7 61.3 59.1 36.2 20.0 30.2 217578784 model | metrics
R_50_FPN_DL_WC1M_3x_Atop10P_CA_B_uniform 3x 1.939 0.072 10.1 60.9 58.5 37.2 21.5 31.0 256453729 model | metrics
R_50_FPN_DL_WC1M_3x_Atop10P_CA_B_uv 3x 1.985 0.072 9.6 61.4 58.9 38.3 22.2 32.1 256452095 model | metrics
R_50_FPN_DL_WC1M_3x_Atop10P_CA_B_finesegm 3x 2.047 0.072 10.3 60.9 58.5 36.7 20.7 30.7 256452819 model | metrics
R_50_FPN_DL_WC1M_3x_Atop10P_CA_B_coarsesegm 3x 1.830 0.070 9.6 61.3 59.2 37.9 21.5 31.6 256455697 model | metrics
Acronyms: `WC1M`: with confidence estimation model type 1 for `U` and `V` and mask confidence estimation `Atop10P`: humans and animals from the 10 best suitable categories are used for training `CA`: class agnostic training, where all annotated instances are mapped into a single category `B_<...>`: schedule with bootstrapping with the specified results sampling strategy Note: The relaxed `dp. APex GPS` metric was used in [Sanakoyeu et al., 2020](https://arxiv.org/pdf/2003.00080.pdf) to evaluate DensePose results. This metric considers matches at thresholds 0.2, 0.3 and 0.4 additionally to the standard ones used in the evaluation protocol. The minimum threshold is controlled by `DENSEPOSE_EVALUATION.MIN_IOU_THRESHOLD` config option. ### License All models available for download are licensed under the [Creative Commons Attribution-ShareAlike 3.0 license](https://creativecommons.org/licenses/by-sa/3.0/) ## References If you use chart-based DensePose methods, please take the references from the following BibTeX entries: DensePose bootstrapping pipeline: ``` @InProceedings{Sanakoyeu2020TransferringDensePose, title = {Transferring Dense Pose to Proximal Animal Classes}, author = {Artsiom Sanakoyeu and Vasil Khalidov and Maureen S. McCarthy and Andrea Vedaldi and Natalia Neverova}, journal = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2020}, } ``` DensePose with confidence estimation: ``` @InProceedings{Neverova2019DensePoseConfidences, title = {Correlated Uncertainty for Learning Dense Correspondences from Noisy Labels}, author = {Neverova, Natalia and Novotny, David and Vedaldi, Andrea}, journal = {Advances in Neural Information Processing Systems}, year = {2019}, } ``` Original DensePose: ``` @InProceedings{Guler2018DensePose, title={DensePose: Dense Human Pose Estimation In The Wild}, author={R\{i}za Alp G\"uler, Natalia Neverova, Iasonas Kokkinos}, journal={The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2018} } ```