ov-seg / datasets /DATASETS.md
liangfeng
add ovseg
583456e
|
raw
history blame
5.1 kB
## Prepare Datasets for OVSeg
This doc is a modification/extension of [MaskFormer](https://github.com/facebookresearch/MaskFormer/blob/main/datasets/README.md) following [Detectron2 fromat](https://detectron2.readthedocs.io/en/latest/tutorials/datasets.html).
A dataset can be used by accessing [DatasetCatalog](https://detectron2.readthedocs.io/modules/data.html#detectron2.data.DatasetCatalog)
for its data, or [MetadataCatalog](https://detectron2.readthedocs.io/modules/data.html#detectron2.data.MetadataCatalog) for its metadata (class names, etc).
This document explains how to setup the builtin datasets so they can be used by the above APIs.
[Use Custom Datasets](https://detectron2.readthedocs.io/tutorials/datasets.html) gives a deeper dive on how to use `DatasetCatalog` and `MetadataCatalog`,
and how to add new datasets to them.
OVSeg has builtin support for a few datasets.
The datasets are assumed to exist in a directory specified by the environment variable
`DETECTRON2_DATASETS`.
Under this directory, detectron2 will look for datasets in the structure described below, if needed.
```
$DETECTRON2_DATASETS/
coco/ # COCOStuff-171
ADEChallengeData2016/ # ADE20K-150
ADE20K_2021_17_01/ # ADE20K-847
VOCdevkit/
VOC2012/ # PASCALVOC-20
VOC2010/ # PASCALContext-59, PASCALContext-459
```
You can set the location for builtin datasets by `export DETECTRON2_DATASETS=/path/to/datasets`.
If left unset, the default is `./datasets` relative to your current working directory.
Without specific notifications, our model is trained on COCOStuff-171 and evlauted on ADE20K-150, ADE20K-847, PASCALVOC-20, PASCALContext-59 and PASCALContext-459.
| dataset | split | # images | # categories |
|:--------------:|:---------:|:--------:|:------------:|
| COCO Stuff | train2017 | 118K | 171 |
| ADE20K | val | 2K | 150/847 |
| Pascal VOC | val | 1.5K | 20 |
| Pascal Context | val | 5K | 59/459 |
### Expected dataset structure for [COCO Stuff](https://github.com/nightrome/cocostuff):
```
coco/
train2017/ # http://images.cocodataset.org/zips/train2017.zip
annotations/ # http://images.cocodataset.org/annotations/annotations_trainval2017.zip
stuffthingmaps/
stuffthingmaps_trainval2017.zip # http://calvin.inf.ed.ac.uk/wp-content/uploads/data/cocostuffdataset/stuffthingmaps_trainval2017.zip
train2017/
# below are generated
stuffthingmaps_detectron2/
train2017/
```
The directory `stuffthingmaps_detectron2` is generated by running `python datasets/prepare_coco_stuff_sem_seg.py`.
### Expected dataset structure for [ADE20k Scene Parsing (ADE20K-150)](http://sceneparsing.csail.mit.edu/):
```
ADEChallengeData2016/
annotations/
images/
objectInfo150.txt
# below are generated
annotations_detectron2/
```
The directory `annotations_detectron2` is generated by running `python datasets/prepare_ade20k_sem_seg.py`.
### Expected dataset structure for [ADE20k-Full (ADE20K-847)](https://github.com/CSAILVision/ADE20K#download):
```
ADE20K_2021_17_01/
images/
index_ade20k.pkl
objects.txt
# below are generated
images_detectron2/
annotations_detectron2/
```
The directories `images_detectron2` and `annotations_detectron2` are generated by running `python datasets/prepare_ade20k_full_sem_seg.py`.
### Expected dataset structure for [Pascal VOC 2012 (PASCALVOC-20)](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/#devkit):
```
VOCdevkit/VOC2012/
Annotations/
ImageSets/
JPEGImages/
SegmentationClass/
SegmentationObject/
SegmentationClassAug/ # https://github.com/kazuto1011/deeplab-pytorch/blob/master/data/datasets/voc12/README.md
# below are generated
images_detectron2/
annotations_detectron2/
```
It starts with a tar file `VOCtrainval_11-May-2012.tar`.
We use SBD augmentated training data as `SegmentationClassAug` following [Deeplab](https://github.com/kazuto1011/deeplab-pytorch/blob/master/data/datasets/voc12/README.md)
The directories `images_detectron2` and `annotations_detectron2` are generated by running `python datasets/prepare_voc_sem_seg.py`.
### Expected dataset structure for [Pascal Context](https://www.cs.stanford.edu/~roozbeh/pascal-context/):
```
VOCdevkit/VOC2010/
Annotations/
ImageSets/
JPEGImages/
SegmentationClass/
SegmentationObject/
# below are from https://www.cs.stanford.edu/~roozbeh/pascal-context/trainval.tar.gz
trainval/
labels.txt
59_labels.txt # https://www.cs.stanford.edu/~roozbeh/pascal-context/59_labels.txt
pascalcontext_val.txt # https://drive.google.com/file/d/1BCbiOKtLvozjVnlTJX51koIveUZHCcUh/view?usp=sharing
# below are generated
annotations_detectron2/
pc459_val
pc59_val
```
It starts with a tar file `VOCtrainval_03-May-2010.tar`. You may want to download the 5K validation set [here](https://drive.google.com/file/d/1BCbiOKtLvozjVnlTJX51koIveUZHCcUh/view?usp=sharing).
The directory `annotations_detectron2` is generated by running `python datasets/prepare_pascal_context.py`.