Spaces:

snicolau
/

PeopleRemover

Running

App Files Files Community

PeopleRemover / detectron2_repo /projects /DensePose /doc /BOOTSTRAPPING_PIPELINE.md

snicolau

Upload 772 files

500565b verified over 1 year ago

preview code

raw

history blame contribute delete

6.7 kB

	# Bootstrapping Pipeline

	Bootstrapping pipeline for DensePose was proposed in
	[Sanakoyeu et al., 2020](https://arxiv.org/pdf/2003.00080.pdf)
	to extend DensePose from humans to proximal animal classes
	(chimpanzees). Currently, the pipeline is only implemented for
	[chart-based models](DENSEPOSE_IUV.md).
	Bootstrapping proceeds in two steps.

	## Master Model Training

	Master model is trained on data from source domain (humans)
	and supporting domain (animals). Instances from the source domain
	contain full DensePose annotations (`S`, `I`, `U` and `V`) and
	instances from the supporting domain have segmentation annotations only.
	To ensure segmentation quality in the target domain, only a subset of
	supporting domain classes is included into the training. This is achieved
	through category filters, e.g.
	(see [configs/evolution/Base-RCNN-FPN-Atop10P_CA.yaml](../configs/evolution/Base-RCNN-FPN-Atop10P_CA.yaml)):

	```
	WHITELISTED_CATEGORIES:
	"base_coco_2017_train":
	- 1 # person
	- 16 # bird
	- 17 # cat
	- 18 # dog
	- 19 # horse
	- 20 # sheep
	- 21 # cow
	- 22 # elephant
	- 23 # bear
	- 24 # zebra
	- 25 # girafe
	```
	The acronym `Atop10P` in config file names indicates that categories are filtered to
	only contain top 10 animals and person.

	The training is performed in a class-agnostic manner: all instances
	are mapped into the same class (person), e.g.
	(see [configs/evolution/Base-RCNN-FPN-Atop10P_CA.yaml](../configs/evolution/Base-RCNN-FPN-Atop10P_CA.yaml)):

	```
	CATEGORY_MAPS:
	"base_coco_2017_train":
	"16": 1 # bird -> person
	"17": 1 # cat -> person
	"18": 1 # dog -> person
	"19": 1 # horse -> person
	"20": 1 # sheep -> person
	"21": 1 # cow -> person
	"22": 1 # elephant -> person
	"23": 1 # bear -> person
	"24": 1 # zebra -> person
	"25": 1 # girafe -> person
	```
	The acronym `CA` in config file names indicates that the training is class-agnostic.

	## Student Model Training

	Student model is trained on data from source domain (humans),
	supporting domain (animals) and target domain (chimpanzees).
	Annotations in source and supporting domains are similar to the ones
	used for the master model training.
	Annotations in target domain are obtained by applying the master model
	to images that contain instances from the target category and sampling
	sparse annotations from dense results. This process is called bootstrapping.
	Below we give details on how the bootstrapping pipeline is implemented.

	### Data Loaders

	The central components that enable bootstrapping are
	[`InferenceBasedLoader`](../densepose/data/inference_based_loader.py) and
	[`CombinedDataLoader`](../densepose/data/combined_loader.py).

	`InferenceBasedLoader` takes images from a data loader, applies a model
	to the images, filters the model outputs based on the selected criteria and
	samples the filtered outputs to produce annotations.

	`CombinedDataLoader` combines data obtained from the loaders based on specified
	ratios. The standard data loader has the default ratio of 1.0,
	ratios for bootstrap datasets are specified in the configuration file.
	The higher the ratio the higher the probability to include samples from the
	particular data loader into a batch.

	Here is an example of the bootstrapping configuration taken from
	[`configs/evolution/densepose_R_50_FPN_DL_WC1M_3x_Atop10P_CA_B_uniform.yaml`](../configs/evolution/densepose_R_50_FPN_DL_WC1M_3x_Atop10P_CA_B_uniform.yaml):
	```
	BOOTSTRAP_DATASETS:
	- DATASET: "chimpnsee"
	RATIO: 1.0
	IMAGE_LOADER:
	TYPE: "video_keyframe"
	SELECT:
	STRATEGY: "random_k"
	NUM_IMAGES: 4
	TRANSFORM:
	TYPE: "resize"
	MIN_SIZE: 800
	MAX_SIZE: 1333
	BATCH_SIZE: 8
	NUM_WORKERS: 1
	INFERENCE:
	INPUT_BATCH_SIZE: 1
	OUTPUT_BATCH_SIZE: 1
	DATA_SAMPLER:
	# supported types:
	# densepose_uniform
	# densepose_UV_confidence
	# densepose_fine_segm_confidence
	# densepose_coarse_segm_confidence
	TYPE: "densepose_uniform"
	COUNT_PER_CLASS: 8
	FILTER:
	TYPE: "detection_score"
	MIN_VALUE: 0.8
	BOOTSTRAP_MODEL:
	WEIGHTS: https://dl.fbaipublicfiles.com/densepose/evolution/densepose_R_50_FPN_DL_WC1M_3x_Atop10P_CA/217578784/model_final_9fe1cc.pkl
	```

	The above example has one bootstrap dataset (`chimpnsee`). This dataset is registered as
	a [VIDEO_LIST](../densepose/data/datasets/chimpnsee.py) dataset, which means that
	it consists of a number of videos specified in a text file. For videos there can be
	different strategies to sample individual images. Here we use `video_keyframe` strategy
	which considers only keyframes; this ensures temporal offset between sampled images and
	faster seek operations. We select at most 4 random keyframes in each video:

	```
	SELECT:
	STRATEGY: "random_k"
	NUM_IMAGES: 4
	```

	The frames are then resized

	```
	TRANSFORM:
	TYPE: "resize"
	MIN_SIZE: 800
	MAX_SIZE: 1333
	```

	and batched using the standard
	[PyTorch DataLoader](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader):

	```
	BATCH_SIZE: 8
	NUM_WORKERS: 1
	```

	`InferenceBasedLoader` decomposes those batches into batches of size `INPUT_BATCH_SIZE`
	and applies the master model specified by `BOOTSTRAP_MODEL`. Models outputs are filtered
	by detection score:

	```
	FILTER:
	TYPE: "detection_score"
	MIN_VALUE: 0.8
	```

	and sampled using the specified sampling strategy:

	```
	DATA_SAMPLER:
	# supported types:
	# densepose_uniform
	# densepose_UV_confidence
	# densepose_fine_segm_confidence
	# densepose_coarse_segm_confidence
	TYPE: "densepose_uniform"
	COUNT_PER_CLASS: 8
	```

	The current implementation supports
	[uniform sampling](../densepose/data/samplers/densepose_uniform.py) and
	[confidence-based sampling](../densepose/data/samplers/densepose_confidence_based.py)
	to obtain sparse annotations from dense results. For confidence-based
	sampling one needs to use the master model which produces confidence estimates.
	The `WC1M` master model used in the example above produces all three types of confidence
	estimates.

	Finally, sampled data is grouped into batches of size `OUTPUT_BATCH_SIZE`:

	```
	INFERENCE:
	INPUT_BATCH_SIZE: 1
	OUTPUT_BATCH_SIZE: 1
	```

	The proportion of data from annotated datasets and bootstrapped dataset can be tracked
	in the logs, e.g.:

	```
	[... densepose.engine.trainer]: batch/ 1.8, batch/base_coco_2017_train 6.4, batch/densepose_coco_2014_train 3.85
	```

	which means that over the last 20 iterations, on average for 1.8 bootstrapped data samples there were 6.4 samples from `base_coco_2017_train` and 3.85 samples from `densepose_coco_2014_train`.