Spaces:
Sleeping
Sleeping
## Generation of crops from the real datasets | |
The instructions below allow to generate the crops used for pre-training CroCo v2 from the following real-world datasets: ARKitScenes, MegaDepth, 3DStreetView and IndoorVL. | |
### Download the metadata of the crops to generate | |
First, download the metadata and put them in `./data/`: | |
``` | |
mkdir -p data | |
cd data/ | |
wget https://download.europe.naverlabs.com/ComputerVision/CroCo/data/crop_metadata.zip | |
unzip crop_metadata.zip | |
rm crop_metadata.zip | |
cd .. | |
``` | |
### Prepare the original datasets | |
Second, download the original datasets in `./data/original_datasets/`. | |
``` | |
mkdir -p data/original_datasets | |
``` | |
##### ARKitScenes | |
Download the `raw` dataset from https://github.com/apple/ARKitScenes/blob/main/DATA.md and put it in `./data/original_datasets/ARKitScenes/`. | |
The resulting file structure should be like: | |
``` | |
./data/original_datasets/ARKitScenes/ | |
ββββTraining | |
ββββ40753679 | |
β β ultrawide | |
β β ... | |
ββββ40753686 | |
β | |
... | |
``` | |
##### MegaDepth | |
Download `MegaDepth v1 Dataset` from https://www.cs.cornell.edu/projects/megadepth/ and put it in `./data/original_datasets/MegaDepth/`. | |
The resulting file structure should be like: | |
``` | |
./data/original_datasets/MegaDepth/ | |
ββββ0000 | |
β ββββimages | |
β β β 1000557903_87fa96b8a4_o.jpg | |
β β β ... | |
β ββββ ... | |
ββββ0001 | |
β β | |
β β ... | |
ββββ ... | |
``` | |
##### 3DStreetView | |
Download `3D_Street_View` dataset from https://github.com/amir32002/3D_Street_View and put it in `./data/original_datasets/3DStreetView/`. | |
The resulting file structure should be like: | |
``` | |
./data/original_datasets/3DStreetView/ | |
ββββdataset_aligned | |
β ββββ0002 | |
β β β 0000002_0000001_0000002_0000001.jpg | |
β β β ... | |
β ββββ ... | |
ββββdataset_unaligned | |
β ββββ0003 | |
β β β 0000003_0000001_0000002_0000001.jpg | |
β β β ... | |
β ββββ ... | |
``` | |
##### IndoorVL | |
Download the `IndoorVL` datasets using [Kapture](https://github.com/naver/kapture). | |
``` | |
pip install kapture | |
mkdir -p ./data/original_datasets/IndoorVL | |
cd ./data/original_datasets/IndoorVL | |
kapture_download_dataset.py update | |
kapture_download_dataset.py install "HyundaiDepartmentStore_*" | |
kapture_download_dataset.py install "GangnamStation_*" | |
cd - | |
``` | |
### Extract the crops | |
Now, extract the crops for each of the dataset: | |
``` | |
for dataset in ARKitScenes MegaDepth 3DStreetView IndoorVL; | |
do | |
python3 datasets/crops/extract_crops_from_images.py --crops ./data/crop_metadata/${dataset}/crops_release.txt --root-dir ./data/original_datasets/${dataset}/ --output-dir ./data/${dataset}_crops/ --imsize 256 --nthread 8 --max-subdir-levels 5 --ideal-number-pairs-in-dir 500; | |
done | |
``` | |
##### Note for IndoorVL | |
Due to some legal issues, we can only release 144,228 pairs out of the 1,593,689 pairs used in the paper. | |
To account for it in terms of number of pre-training iterations, the pre-training command in this repository uses 125 training epochs including 12 warm-up epochs and learning rate cosine schedule of 250, instead of 100, 10 and 200 respectively. | |
The impact on the performance is negligible. | |