Spaces:
Running
Running
# Use Custom Datasets | |
This document explains how the dataset APIs | |
([DatasetCatalog](../modules/data.html#detectron2.data.DatasetCatalog), [MetadataCatalog](../modules/data.html#detectron2.data.MetadataCatalog)) | |
work, and how to use them to add custom datasets. | |
Datasets that have builtin support in detectron2 are listed in [builtin datasets](builtin_datasets.md). | |
If you want to use a custom dataset while also reusing detectron2's data loaders, | |
you will need to: | |
1. __Register__ your dataset (i.e., tell detectron2 how to obtain your dataset). | |
2. Optionally, __register metadata__ for your dataset. | |
Next, we explain the above two concepts in detail. | |
The [Colab tutorial](https://colab.research.google.com/drive/16jcaJoc6bCFAQ96jDe2HwtXj7BMD_-m5) | |
has a live example of how to register and train on a dataset of custom formats. | |
### Register a Dataset | |
To let detectron2 know how to obtain a dataset named "my_dataset", users need to implement | |
a function that returns the items in your dataset and then tell detectron2 about this | |
function: | |
```python | |
def my_dataset_function(): | |
... | |
return list[dict] in the following format | |
from detectron2.data import DatasetCatalog | |
DatasetCatalog.register("my_dataset", my_dataset_function) | |
# later, to access the data: | |
data: List[Dict] = DatasetCatalog.get("my_dataset") | |
``` | |
Here, the snippet associates a dataset named "my_dataset" with a function that returns the data. | |
The function must return the same data (with same order) if called multiple times. | |
The registration stays effective until the process exits. | |
The function can do arbitrary things and should return the data in `list[dict]`, each dict in either | |
of the following formats: | |
1. Detectron2's standard dataset dict, described below. This will make it work with many other builtin | |
features in detectron2, so it's recommended to use it when it's sufficient. | |
2. Any custom format. You can also return arbitrary dicts in your own format, | |
such as adding extra keys for new tasks. | |
Then you will need to handle them properly downstream as well. | |
See below for more details. | |
#### Standard Dataset Dicts | |
For standard tasks | |
(instance detection, instance/semantic/panoptic segmentation, keypoint detection), | |
we load the original dataset into `list[dict]` with a specification similar to COCO's annotations. | |
This is our standard representation for a dataset. | |
Each dict contains information about one image. | |
The dict may have the following fields, | |
and the required fields vary based on what the dataloader or the task needs (see more below). | |
```eval_rst | |
.. list-table:: | |
:header-rows: 1 | |
* - Task | |
- Fields | |
* - Common | |
- file_name, height, width, image_id | |
* - Instance detection/segmentation | |
- annotations | |
* - Semantic segmentation | |
- sem_seg_file_name | |
* - Panoptic segmentation | |
- pan_seg_file_name, segments_info | |
``` | |
+ `file_name`: the full path to the image file. | |
+ `height`, `width`: integer. The shape of the image. | |
+ `image_id` (str or int): a unique id that identifies this image. Required by many | |
evaluators to identify the images, but a dataset may use it for different purposes. | |
+ `annotations` (list[dict]): Required by __instance detection/segmentation or keypoint detection__ tasks. | |
Each dict corresponds to annotations of one instance in this image, and | |
may contain the following keys: | |
+ `bbox` (list[float], required): list of 4 numbers representing the bounding box of the instance. | |
+ `bbox_mode` (int, required): the format of bbox. It must be a member of | |
[structures.BoxMode](../modules/structures.html#detectron2.structures.BoxMode). | |
Currently supports: `BoxMode.XYXY_ABS`, `BoxMode.XYWH_ABS`. | |
+ `category_id` (int, required): an integer in the range [0, num_categories-1] representing the category label. | |
The value num_categories is reserved to represent the "background" category, if applicable. | |
+ `segmentation` (list[list[float]] or dict): the segmentation mask of the instance. | |
+ If `list[list[float]]`, it represents a list of polygons, one for each connected component | |
of the object. Each `list[float]` is one simple polygon in the format of `[x1, y1, ..., xn, yn]` (nβ₯3). | |
The Xs and Ys are absolute coordinates in unit of pixels. | |
+ If `dict`, it represents the per-pixel segmentation mask in COCO's compressed RLE format. | |
The dict should have keys "size" and "counts". You can convert a uint8 segmentation mask of 0s and | |
1s into such dict by `pycocotools.mask.encode(np.asarray(mask, order="F"))`. | |
`cfg.INPUT.MASK_FORMAT` must be set to `bitmask` if using the default data loader with such format. | |
+ `keypoints` (list[float]): in the format of [x1, y1, v1,..., xn, yn, vn]. | |
v[i] means the [visibility](http://cocodataset.org/#format-data) of this keypoint. | |
`n` must be equal to the number of keypoint categories. | |
The Xs and Ys are absolute real-value coordinates in range [0, W or H]. | |
(Note that the keypoint coordinates in COCO format are integers in range [0, W-1 or H-1], which is different | |
from our standard format. Detectron2 adds 0.5 to COCO keypoint coordinates to convert them from discrete | |
pixel indices to floating point coordinates.) | |
+ `iscrowd`: 0 (default) or 1. Whether this instance is labeled as COCO's "crowd | |
region". Don't include this field if you don't know what it means. | |
If `annotations` is an empty list, it means the image is labeled to have no objects. | |
Such images will by default be removed from training, | |
but can be included using `DATALOADER.FILTER_EMPTY_ANNOTATIONS`. | |
+ `sem_seg_file_name` (str): | |
The full path to the semantic segmentation ground truth file. | |
It should be a grayscale image whose pixel values are integer labels. | |
+ `pan_seg_file_name` (str): | |
The full path to panoptic segmentation ground truth file. | |
It should be an RGB image whose pixel values are integer ids encoded using the | |
[panopticapi.utils.id2rgb](https://github.com/cocodataset/panopticapi/) function. | |
The ids are defined by `segments_info`. | |
If an id does not appear in `segments_info`, the pixel is considered unlabeled | |
and is usually ignored in training & evaluation. | |
+ `segments_info` (list[dict]): defines the meaning of each id in panoptic segmentation ground truth. | |
Each dict has the following keys: | |
+ `id` (int): integer that appears in the ground truth image. | |
+ `category_id` (int): an integer in the range [0, num_categories-1] representing the category label. | |
+ `iscrowd`: 0 (default) or 1. Whether this instance is labeled as COCO's "crowd region". | |
```eval_rst | |
.. note:: | |
The PanopticFPN model does not use the panoptic segmentation | |
format defined here, but a combination of both instance segmentation and semantic segmentation data | |
format. See :doc:`builtin_datasets` for instructions on COCO. | |
``` | |
Fast R-CNN (with pre-computed proposals) models are rarely used today. | |
To train a Fast R-CNN, the following extra keys are needed: | |
+ `proposal_boxes` (array): 2D numpy array with shape (K, 4) representing K precomputed proposal boxes for this image. | |
+ `proposal_objectness_logits` (array): numpy array with shape (K, ), which corresponds to the objectness | |
logits of proposals in 'proposal_boxes'. | |
+ `proposal_bbox_mode` (int): the format of the precomputed proposal bbox. | |
It must be a member of | |
[structures.BoxMode](../modules/structures.html#detectron2.structures.BoxMode). | |
Default is `BoxMode.XYXY_ABS`. | |
#### Custom Dataset Dicts for New Tasks | |
In the `list[dict]` that your dataset function returns, the dictionary can also have __arbitrary custom data__. | |
This will be useful for a new task that needs extra information not covered | |
by the standard dataset dicts. In this case, you need to make sure the downstream code can handle your data | |
correctly. Usually this requires writing a new `mapper` for the dataloader (see [Use Custom Dataloaders](./data_loading.md)). | |
When designing a custom format, note that all dicts are stored in memory | |
(sometimes serialized and with multiple copies). | |
To save memory, each dict is meant to contain __small__ but sufficient information | |
about each sample, such as file names and annotations. | |
Loading full samples typically happens in the data loader. | |
For attributes shared among the entire dataset, use `Metadata` (see below). | |
To avoid extra memory, do not save such information inside each sample. | |
### "Metadata" for Datasets | |
Each dataset is associated with some metadata, accessible through | |
`MetadataCatalog.get(dataset_name).some_metadata`. | |
Metadata is a key-value mapping that contains information that's shared among | |
the entire dataset, and usually is used to interpret what's in the dataset, e.g., | |
names of classes, colors of classes, root of files, etc. | |
This information will be useful for augmentation, evaluation, visualization, logging, etc. | |
The structure of metadata depends on what is needed from the corresponding downstream code. | |
If you register a new dataset through `DatasetCatalog.register`, | |
you may also want to add its corresponding metadata through | |
`MetadataCatalog.get(dataset_name).some_key = some_value`, to enable any features that need the metadata. | |
You can do it like this (using the metadata key "thing_classes" as an example): | |
```python | |
from detectron2.data import MetadataCatalog | |
MetadataCatalog.get("my_dataset").thing_classes = ["person", "dog"] | |
``` | |
Here is a list of metadata keys that are used by builtin features in detectron2. | |
If you add your own dataset without these metadata, some features may be | |
unavailable to you: | |
* `thing_classes` (list[str]): Used by all instance detection/segmentation tasks. | |
A list of names for each instance/thing category. | |
If you load a COCO format dataset, it will be automatically set by the function `load_coco_json`. | |
* `thing_colors` (list[tuple(r, g, b)]): Pre-defined color (in [0, 255]) for each thing category. | |
Used for visualization. If not given, random colors will be used. | |
* `stuff_classes` (list[str]): Used by semantic and panoptic segmentation tasks. | |
A list of names for each stuff category. | |
* `stuff_colors` (list[tuple(r, g, b)]): Pre-defined color (in [0, 255]) for each stuff category. | |
Used for visualization. If not given, random colors are used. | |
* `ignore_label` (int): Used by semantic and panoptic segmentation tasks. Pixels in ground-truth | |
annotations with this category label should be ignored in evaluation. Typically these are "unlabeled" | |
pixels. | |
* `keypoint_names` (list[str]): Used by keypoint detection. A list of names for each keypoint. | |
* `keypoint_flip_map` (list[tuple[str]]): Used by keypoint detection. A list of pairs of names, | |
where each pair are the two keypoints that should be flipped if the image is | |
flipped horizontally during augmentation. | |
* `keypoint_connection_rules`: list[tuple(str, str, (r, g, b))]. Each tuple specifies a pair of keypoints | |
that are connected and the color (in [0, 255]) to use for the line between them when visualized. | |
Some additional metadata that are specific to the evaluation of certain datasets (e.g. COCO): | |
* `thing_dataset_id_to_contiguous_id` (dict[int->int]): Used by all instance detection/segmentation tasks in the COCO format. | |
A mapping from instance class ids in the dataset to contiguous ids in range [0, #class). | |
Will be automatically set by the function `load_coco_json`. | |
* `stuff_dataset_id_to_contiguous_id` (dict[int->int]): Used when generating prediction json files for | |
semantic/panoptic segmentation. | |
A mapping from semantic segmentation class ids in the dataset | |
to contiguous ids in [0, num_categories). It is useful for evaluation only. | |
* `json_file`: The COCO annotation json file. Used by COCO evaluation for COCO-format datasets. | |
* `panoptic_root`, `panoptic_json`: Used by COCO-format panoptic evaluation. | |
* `evaluator_type`: Used by the builtin main training script to select | |
evaluator. Don't use it in a new training script. | |
You can just provide the [DatasetEvaluator](../modules/evaluation.html#detectron2.evaluation.DatasetEvaluator) | |
for your dataset directly in your main script. | |
```eval_rst | |
.. note:: | |
In recognition, sometimes we use the term "thing" for instance-level tasks, | |
and "stuff" for semantic segmentation tasks. | |
Both are used in panoptic segmentation tasks. | |
For background on the concept of "thing" and "stuff", see | |
`On Seeing Stuff: The Perception of Materials by Humans and Machines | |
<http://persci.mit.edu/pub_pdfs/adelson_spie_01.pdf>`_. | |
``` | |
### Register a COCO Format Dataset | |
If your instance-level (detection, segmentation, keypoint) dataset is already a json file in the COCO format, | |
the dataset and its associated metadata can be registered easily with: | |
```python | |
from detectron2.data.datasets import register_coco_instances | |
register_coco_instances("my_dataset", {}, "json_annotation.json", "path/to/image/dir") | |
``` | |
If your dataset is in COCO format but need to be further processed, or has extra custom per-instance annotations, | |
the [load_coco_json](../modules/data.html#detectron2.data.datasets.load_coco_json) | |
function might be useful. | |
### Update the Config for New Datasets | |
Once you've registered the dataset, you can use the name of the dataset (e.g., "my_dataset" in | |
example above) in `cfg.DATASETS.{TRAIN,TEST}`. | |
There are other configs you might want to change to train or evaluate on new datasets: | |
* `MODEL.ROI_HEADS.NUM_CLASSES` and `MODEL.RETINANET.NUM_CLASSES` are the number of thing classes | |
for R-CNN and RetinaNet models, respectively. | |
* `MODEL.ROI_KEYPOINT_HEAD.NUM_KEYPOINTS` sets the number of keypoints for Keypoint R-CNN. | |
You'll also need to set [Keypoint OKS](http://cocodataset.org/#keypoints-eval) | |
with `TEST.KEYPOINT_OKS_SIGMAS` for evaluation. | |
* `MODEL.SEM_SEG_HEAD.NUM_CLASSES` sets the number of stuff classes for Semantic FPN & Panoptic FPN. | |
* `TEST.DETECTIONS_PER_IMAGE` controls the maximum number of objects to be detected. | |
Set it to a larger number if test images may contain >100 objects. | |
* If you're training Fast R-CNN (with precomputed proposals), `DATASETS.PROPOSAL_FILES_{TRAIN,TEST}` | |
need to match the datasets. The format of proposal files are documented | |
[here](../modules/data.html#detectron2.data.load_proposals_into_dataset). | |
New models | |
(e.g. [TensorMask](../../projects/TensorMask), | |
[PointRend](../../projects/PointRend)) | |
often have similar configs of their own that need to be changed as well. | |
```eval_rst | |
.. tip:: | |
After changing the number of classes, certain layers in a pre-trained model will become incompatible | |
and therefore cannot be loaded to the new model. | |
This is expected, and loading such pre-trained models will produce warnings about such layers. | |
``` | |