Continuous Surface Embeddings for Dense Pose Estimation for Humans and Animals

Overview

The pipeline uses Faster R-CNN with Feature Pyramid Network meta architecture outlined in Figure 1. For each detected object, the model predicts its coarse segmentation S (2 channels: foreground / background) and the embedding E (16 channels). At the same time, the embedder produces vertex embeddings Ê for the corresponding mesh. Universal positional embeddings E and vertex embeddings Ê are matched to derive for each pixel its continuous surface embedding.

Figure 1. DensePose continuous surface embeddings architecture based on Faster R-CNN with Feature Pyramid Network (FPN).

Datasets

For more details on datasets used for training and validation of continuous surface embeddings models, please refer to the DensePose Datasets page.

Model Zoo and Baselines

Human CSE Models

Continuous surface embeddings models for humans trained using the protocols from Neverova et al, 2020.

Models trained with hard assignment loss ℒ:

Name	lr sched	train time (s/iter)	inference time (s/im)	train mem (GB)	box AP	segm AP	dp. AP GPS	dp. AP GPSm	model id	download
R_50_FPN_s1x	s1x	0.349	0.060	6.3	61.1	67.1	64.4	65.7	251155172	model \| metrics
R_101_FPN_s1x	s1x	0.461	0.071	7.4	62.3	67.2	64.7	65.8	251155500	model \| metrics
R_50_FPN_DL_s1x	s1x	0.399	0.061	7.0	60.8	67.8	65.5	66.4	251156349	model \| metrics
R_101_FPN_DL_s1x	s1x	0.504	0.074	8.3	61.5	68.0	65.6	66.6	251156606	model \| metrics

Models trained with soft assignment loss ℒ_σ:

Name	lr sched	train time (s/iter)	inference time (s/im)	train mem (GB)	box AP	segm AP	dp. AP GPS	dp. AP GPSm	model id	download
R_50_FPN_soft_s1x	s1x	0.357	0.057	9.7	61.3	66.9	64.3	65.4	250533982	model \| metrics
R_101_FPN_soft_s1x	s1x	0.464	0.071	10.5	62.1	67.3	64.5	66.0	250712522	model \| metrics
R_50_FPN_DL_soft_s1x	s1x	0.427	0.062	11.3	60.8	68.0	66.1	66.7	250713703	model \| metrics
R_101_FPN_DL_soft_s1x	s1x	0.483	0.071	12.2	61.5	68.2	66.2	67.1	250713061	model \| metrics

Animal CSE Models

Models obtained by finetuning human CSE models on animals data from ds1_train (see the DensePose LVIS section for more details on the datasets) with soft assignment loss ℒ_σ:

Name	lr sched	train time (s/iter)	inference time (s/im)	train mem (GB)	box AP	segm AP	dp. AP GPS	dp. AP GPSm	model id	download
R_50_FPN_soft_chimps_finetune_4k	4K	0.569	0.051	4.7	62.0	59.0	32.2	39.6	253146869	model \| metrics
R_50_FPN_soft_animals_finetune_4k	4K	0.381	0.061	7.3	44.9	55.5	21.3	28.8	253145793	model \| metrics
R_50_FPN_soft_animals_CA_finetune_4k	4K	0.412	0.059	7.1	53.4	59.5	25.4	33.4	253498611	model \| metrics

Acronyms:

CA: class agnostic training, where all annotated instances are mapped into a single category

Models obtained by finetuning human CSE models on animals data from ds2_train dataset with soft assignment loss ℒ_σ and, for some schedules, cycle losses. Please refer to DensePose LVIS section for details on the dataset and to Neverova et al, 2021 for details on cycle losses.

Name	lr sched	train time (s/iter)	inference time (s/im)	train mem (GB)	box AP	segm AP	dp. AP GPS	dp. AP GPSm	GErr	GPS	model id	download
R_50_FPN_soft_animals_I0_finetune_16k	16k	0.386	0.058	8.4	54.2	67.0	29.0	38.6	13.2	85.4	270727112	model \| metrics
R_50_FPN_soft_animals_I0_finetune_m2m_16k	16k	0.508	0.056	12.2	54.1	67.3	28.6	38.4	12.5	87.6	270982215	model \| metrics
R_50_FPN_soft_animals_I0_finetune_i2m_16k	16k	0.483	0.056	9.7	54.0	66.6	28.9	38.3	11.0	88.9	270727461	model \| metrics

References

If you use DensePose methods based on continuous surface embeddings, please take the references from the following BibTeX entries:

Continuous surface embeddings:

@InProceedings{Neverova2020ContinuousSurfaceEmbeddings,
    title = {Continuous Surface Embeddings},
    author = {Neverova, Natalia and Novotny, David and Khalidov, Vasil and Szafraniec, Marc and Labatut, Patrick and Vedaldi, Andrea},
    journal = {Advances in Neural Information Processing Systems},
    year = {2020},
}

Cycle Losses:

@InProceedings{Neverova2021UniversalCanonicalMaps,
    title = {Discovering Relationships between Object Categories via Universal Canonical Maps},
    author = {Neverova, Natalia and Sanakoyeu, Artsiom and Novotny, David and Labatut, Patrick and Vedaldi, Andrea},
    journal = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    year = {2021},
}