Spaces:

PAIR
/

PAIR-Diffusion

Runtime error

App Files Files Community

PAIR-Diffusion / annotator /OneFormer /datasets /README.md

vidit98

demo files

2171e8f about 2 years ago

preview code

raw

history blame contribute delete

5.71 kB

	# Prepare Datasets for OneFormer

	- A dataset can be used by accessing [DatasetCatalog](https://detectron2.readthedocs.io/modules/data.html#detectron2.data.DatasetCatalog) for its data, or [MetadataCatalog](https://detectron2.readthedocs.io/modules/data.html#detectron2.data.MetadataCatalog) for its metadata (class names, etc).
	- This document explains how to setup the builtin datasets so they can be used by the above APIs. [Training OneFormer with Custom Datasets](https://github.com/SHI-Labs/OneFormer/tree/main/datasets/custom_datasets) gives a deeper dive on how to train OneFormer with custom datasets.
	- Detectron2 has builtin support for a few datasets. The datasets are assumed to exist in a directory specified by the environment variable `DETECTRON2_DATASETS`. Under this directory, detectron2 will look for datasets in the structure described below, if needed.

	```text
	$DETECTRON2_DATASETS/
	ADEChallengeData2016/
	cityscapes/
	coco/
	mapillary_vistas/
	```

	- You can set the location for builtin datasets by `export DETECTRON2_DATASETS=/path/to/datasets`. If left unset, the default is `./datasets` relative to your current working directory.


	## Expected dataset structure for [ADE20K](http://sceneparsing.csail.mit.edu/)

	```text
	ADEChallengeData2016/
	images/
	annotations/
	objectInfo150.txt
	# download instance annotation
	annotations_instance/
	# generated by prepare_ade20k_sem_seg.py
	annotations_detectron2/
	# below are generated by prepare_ade20k_pan_seg.py
	ade20k_panoptic_{train,val}.json
	ade20k_panoptic_{train,val}/
	# below are generated by prepare_ade20k_ins_seg.py
	ade20k_instance_{train,val}.json
	```

	- Generate `annotations_detectron2`:

	```bash
	python datasets/prepare_ade20k_sem_seg.py
	```

	- Install panopticapi by:

	```bash
	pip install git+https://github.com/cocodataset/panopticapi.git
	```

	- Download the instance annotation from <http://sceneparsing.csail.mit.edu/>:

	```bash
	wget http://sceneparsing.csail.mit.edu/data/ChallengeData2017/annotations_instance.tar
	```

	- Then, run `python datasets/prepare_ade20k_pan_seg.py`, to combine semantic and instance annotations for panoptic annotations.

	- Run `python datasets/prepare_ade20k_ins_seg.py`, to extract instance annotations in COCO format.

	## Expected dataset structure for [Cityscapes](https://www.cityscapes-dataset.com/downloads/)

	```text
	cityscapes/
	gtFine/
	train/
	aachen/
	color.png, instanceIds.png, labelIds.png, polygons.json,
	labelTrainIds.png
	...
	val/
	test/
	# below are generated Cityscapes panoptic annotation
	cityscapes_panoptic_train.json
	cityscapes_panoptic_train/
	cityscapes_panoptic_val.json
	cityscapes_panoptic_val/
	cityscapes_panoptic_test.json
	cityscapes_panoptic_test/
	leftImg8bit/
	train/
	val/
	test/
	```

	- Login and download the dataset

	```bash
	wget --keep-session-cookies --save-cookies=cookies.txt --post-data 'username=myusername&password=mypassword&submit=Login' https://www.cityscapes-dataset.com/login/
	######## gtFine
	wget --load-cookies cookies.txt --content-disposition https://www.cityscapes-dataset.com/file-handling/?packageID=1
	######## leftImg8bit
	wget --load-cookies cookies.txt --content-disposition https://www.cityscapes-dataset.com/file-handling/?packageID=3
	```

	- Install cityscapes scripts by:

	```bash
	pip install git+https://github.com/mcordts/cityscapesScripts.git
	```

	- To create labelTrainIds.png, first prepare the above structure, then run cityscapesescript with:

	```bash
	git clone https://github.com/mcordts/cityscapesScripts.git
	```

	```bash
	CITYSCAPES_DATASET=/path/to/abovementioned/cityscapes python cityscapesScripts/cityscapesscripts/preparation/createTrainIdLabelImgs.py
	```

	These files are not needed for instance segmentation.

	- To generate Cityscapes panoptic dataset, run cityscapesescript with:

	```bash
	CITYSCAPES_DATASET=/path/to/abovementioned/cityscapes python cityscapesScripts/cityscapesscripts/preparation/createPanopticImgs.py
	```

	These files are not needed for semantic and instance segmentation.

	## Expected dataset structure for [COCO](https://cocodataset.org/#download)

	```text
	coco/
	annotations/
	instances_{train,val}2017.json
	panoptic_{train,val}2017.json
	caption_{train,val}2017.json
	# evaluate on instance labels derived from panoptic annotations
	panoptic2instances_val2017.json
	{train,val}2017/
	# image files that are mentioned in the corresponding json
	panoptic_{train,val}2017/ # png annotations
	panoptic_semseg_{train,val}2017/ # generated by the script mentioned below
	```

	- Install panopticapi by:

	```bash
	pip install git+https://github.com/cocodataset/panopticapi.git
	```

	- Then, run `python datasets/prepare_coco_semantic_annos_from_panoptic_annos.py`, to extract semantic annotations from panoptic annotations (only used for evaluation).

	- Then run the following command to convert the panoptic json into instance json format (used for evaluation on instance segmentation task):

	```bash
	python datasets/panoptic2detection_coco_format.py --things_only
	```

	## Expected dataset structure for [Mapillary Vistas](https://www.mapillary.com/dataset/vistas)

	```text
	mapillary_vistas/
	training/
	images/
	instances/
	labels/
	panoptic/
	validation/
	images/
	instances/
	labels/
	panoptic/
	mapillary_vistas_instance_{train,val}.json # generated by the script mentioned below
	```

	No preprocessing is needed for Mapillary Vistas on semantic and panoptic segmentation.

	We do not evaluate for the instance segmentation task on the Mapillary Vistas dataset.