Spaces:
Runtime error
Runtime error
# Prepare datasets for Detic | |
The basic training of our model uses [LVIS](https://www.lvisdataset.org/) (which uses [COCO](https://cocodataset.org/) images) and [ImageNet-21K](https://www.image-net.org/download.php). | |
Some models are trained on [Conceptual Caption (CC3M)](https://ai.google.com/research/ConceptualCaptions/). | |
Optionally, we use [Objects365](https://www.objects365.org/) and [OpenImages (Challenge 2019 version)](https://storage.googleapis.com/openimages/web/challenge2019.html) for cross-dataset evaluation. | |
Before starting processing, please download the (selected) datasets from the official websites and place or sim-link them under `$Detic_ROOT/datasets/`. | |
``` | |
$Detic_ROOT/datasets/ | |
metadata/ | |
lvis/ | |
coco/ | |
imagenet/ | |
cc3m/ | |
objects365/ | |
oid/ | |
``` | |
`metadata/` is our preprocessed meta-data (included in the repo). See the below [section](#Metadata) for details. | |
Please follow the following instruction to pre-process individual datasets. | |
### COCO and LVIS | |
First, download COCO and LVIS data place them in the following way: | |
``` | |
lvis/ | |
lvis_v1_train.json | |
lvis_v1_val.json | |
coco/ | |
train2017/ | |
val2017/ | |
annotations/ | |
captions_train2017.json | |
instances_train2017.json | |
instances_val2017.json | |
``` | |
Next, prepare the open-vocabulary LVIS training set using | |
``` | |
python tools/remove_lvis_rare.py --ann datasets/lvis/lvis_v1_train.json | |
``` | |
This will generate `datasets/lvis/lvis_v1_train_norare.json`. | |
### ImageNet-21K | |
The ImageNet-21K folder should look like: | |
``` | |
imagenet/ | |
ImageNet-21K/ | |
n01593028.tar | |
n01593282.tar | |
... | |
``` | |
We first unzip the overlapping classes of LVIS (we will directly work with the .tar file for the rest classes) and convert them into LVIS annotation format. | |
~~~ | |
mkdir imagenet/annotations | |
python tools/unzip_imagenet_lvis.py --dst_path datasets/imagenet/ImageNet-LVIS | |
python tools/create_imagenetlvis_json.py --imagenet_path datasets/imagenet/ImageNet-LVIS --out_path datasets/imagenet/annotations/imagenet_lvis_image_info.json | |
~~~ | |
This creates `datasets/imagenet/annotations/imagenet_lvis_image_info.json`. | |
[Optional] To train with all the 21K classes, run | |
~~~ | |
python tools/get_imagenet_21k_full_tar_json.py | |
python tools/create_lvis_21k.py | |
~~~ | |
This creates `datasets/imagenet/annotations/imagenet-21k_image_info_lvis-21k.json` and `datasets/lvis/lvis_v1_train_lvis-21k.json` (combined LVIS and ImageNet-21K classes in `categories`). | |
[Optional] To train on combined LVIS and COCO, run | |
~~~ | |
python tools/merge_lvis_coco.py | |
~~~ | |
This creates `datasets/lvis/lvis_v1_train+coco_mask.json` | |
### Conceptual Caption | |
Download the dataset from [this](https://ai.google.com/research/ConceptualCaptions/download) page and place them as: | |
``` | |
cc3m/ | |
GCC-training.tsv | |
``` | |
Run the following command to download the images and convert the annotations to LVIS format (Note: download images takes long). | |
~~~ | |
python tools/download_cc.py --ann datasets/cc3m/GCC-training.tsv --save_image_path datasets/cc3m/training/ --out_path datasets/cc3m/train_image_info.json | |
python tools/get_cc_tags.py | |
~~~ | |
This creates `datasets/cc3m/train_image_info_tags.json`. | |
### Objects365 | |
Download Objects365 (v2) from the website. We only need the validation set in this project: | |
``` | |
objects365/ | |
annotations/ | |
zhiyuan_objv2_val.json | |
val/ | |
images/ | |
v1/ | |
patch0/ | |
... | |
patch15/ | |
v2/ | |
patch16/ | |
... | |
patch49/ | |
``` | |
The original annotation has typos in the class names, we first fix them for our following use of language embeddings. | |
``` | |
python tools/fix_o365_names.py --ann datasets/objects365/annotations/zhiyuan_objv2_val.json | |
``` | |
This creates `datasets/objects365/zhiyuan_objv2_val_fixname.json`. | |
To train on Objects365, download the training images and use the command above. We note some images in the training annotation do not exist. | |
We use the following command to filter the missing images. | |
~~~ | |
python tools/fix_0365_path.py | |
~~~ | |
This creates `datasets/objects365/zhiyuan_objv2_train_fixname_fixmiss.json`. | |
### OpenImages | |
We followed the instructions in [UniDet](https://github.com/xingyizhou/UniDet/blob/master/projects/UniDet/unidet_docs/DATASETS.md#openimages) to convert the metadata for OpenImages. | |
The converted folder should look like | |
``` | |
oid/ | |
annotations/ | |
oid_challenge_2019_train_bbox.json | |
oid_challenge_2019_val_expanded.json | |
images/ | |
0/ | |
1/ | |
2/ | |
... | |
``` | |
### Open-vocabulary COCO | |
We first follow [OVR-CNN](https://github.com/alirezazareian/ovr-cnn/blob/master/ipynb/003.ipynb) to create the open-vocabulary COCO split. The converted files should be like | |
``` | |
coco/ | |
zero-shot/ | |
instances_train2017_seen_2.json | |
instances_val2017_all_2.json | |
``` | |
We further pre-process the annotation format for easier evaluation: | |
``` | |
python tools/get_coco_zeroshot_oriorder.py --data_path datasets/coco/zero-shot/instances_train2017_seen_2.json | |
python tools/get_coco_zeroshot_oriorder.py --data_path datasets/coco/zero-shot/instances_val2017_all_2.json | |
``` | |
Next, we preprocess the COCO caption data: | |
``` | |
python tools/get_cc_tags.py --cc_ann datasets/coco/annotations/captions_train2017.json --out_path datasets/coco/captions_train2017_tags_allcaps.json --allcaps --convert_caption | |
``` | |
This creates `datasets/coco/captions_train2017_tags_allcaps.json`. | |
### Metadata | |
``` | |
metadata/ | |
lvis_v1_train_cat_info.json | |
coco_clip_a+cname.npy | |
lvis_v1_clip_a+cname.npy | |
o365_clip_a+cnamefix.npy | |
oid_clip_a+cname.npy | |
imagenet_lvis_wnid.txt | |
Objects365_names_fix.csv | |
``` | |
`lvis_v1_train_cat_info.json` is used by the Federated loss. | |
This is created by | |
~~~ | |
python tools/get_lvis_cat_info.py --ann datasets/lvis/lvis_v1_train.json | |
~~~ | |
`*_clip_a+cname.npy` is the pre-computed CLIP embeddings for each datasets. | |
They are created by (taking LVIS as an example) | |
~~~ | |
python tools/dump_clip_features.py --ann datasets/lvis/lvis_v1_val.json --out_path metadata/lvis_v1_clip_a+cname.npy | |
~~~ | |
Note we do not include the 21K class embeddings due to the large file size. | |
To create it, run | |
~~~ | |
python tools/dump_clip_features.py --ann datasets/lvis/lvis_v1_val_lvis-21k.json --out_path datasets/metadata/lvis-21k_clip_a+cname.npy | |
~~~ | |
`imagenet_lvis_wnid.txt` is the list of matched classes between ImageNet-21K and LVIS. | |
`Objects365_names_fix.csv` is our manual fix of the Objects365 names. |