Realcat's picture
add: rdd sparse and dense match
1b369eb

A newer version of the Gradio SDK is available: 5.32.1

Upgrade

RDD: Robust Feature Detector and Descriptor using Deformable Transformer (CVPR 2025)

Gonglin Chen · Tianwen Fu · Haiwei Chen · Wenbin Teng · Hanyuan Xiao · Yajie Zhao

Project Page

Table of Contents

Updates

  • SfM reconstruction through COLMAP added. We provide a ready-to-use notebook for a simple example. Code adopted from hloc.

  • Training code and new weights released.

  • We have updated the training code compared to what was described in the paper. In the original setup, the RDD was trained on the MegaDepth and Air-to-Ground datasets by resizing all images to the training resolution. In this release, we retrained RDD on MegaDepth only, using a combination of resizing and cropping, a strategy used by ALIKE. This change significantly improves robustness.

MegaDepth-1500 MegaDepth-View Air-to-Ground
AUC 5°AUC 10°AUC 20° AUC 5°AUC 10°AUC 20° AUC 5°AUC 10°AUC 20°
RDD-v2 52.468.580.1 52.067.178.2 45.858.671.0
RDD-v1 48.265.278.3 38.353.165.6 41.456.067.8
RDD-v2+LG 53.369.882.0 59.074.284.0 54.869.079.1
RDD-v1+LG 52.368.981.8 54.269.380.3 55.168.978.9

Installation

git clone --recursive https://github.com/xtcpete/rdd
cd RDD

# Create conda env
conda create -n rdd python=3.10 pip
conda activate rdd

# Install CUDA 
conda install -c nvidia/label/cuda-11.8.0 cuda-toolkit
# Install torch
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu118
# Install all dependencies
pip install -r requirements.txt
# Compile custom operations
cd ./RDD/models/ops
pip install -e .

We provide the download link to:

  • the MegaDepth-1500 test set
  • the MegaDepth-View test set
  • the Air-to-Ground test set
  • 2 pretrained models, RDD and LightGlue for matching RDD

Create and unzip downloaded test data to the data folder.

Create and add weights to the weights folder and you are ready to go.

Usage

For your convenience, we provide a ready-to-use notebook for some examples.

Inference

from RDD.RDD import build

RDD_model = build()

output = RDD_model.extract(torch.randn(1, 3, 480, 640))

Evaluation

Please note that due to the different GPU architectures and the stochastic nature of RANSAC, you may observe slightly different results; however, they should be very close to those reported in the paper. To reproduce the number in paper, use v1 weights instead.

Results can be visualized by passing argument --plot

MegaDepth-1500

# Sparse matching
python ./benchmarks/mega_1500.py

# Dense matching
python ./benchmarks/mega_1500.py --method dense

# LightGlue
python ./benchmarks/mega_1500.py --method lightglue

MegaDepth-View

# Sparse matching
python ./benchmarks/mega_view.py

# Dense matching
python ./benchmarks/mega_view.py --method dense

# LightGlue
python ./benchmarks/mega_view.py --method lightglue

Air-to-Ground

# Sparse matching
python ./benchmarks/air_ground.py

# Dense matching
python ./benchmarks/air_ground.py --method dense

# LightGlue
python ./benchmarks/air_ground.py --method lightglue

Training

  1. Download MegaDepth dataset using download.sh and megadepth_indices from LoFTR. Then the MegaDepth root folder should look like the following:
./data/megadepth/megadepth_indices # indices
./data/megadepth/depth_undistorted # depth maps
./data/megadepth/Undistorted_SfM # images and poses
./data/megadepth/scene_info # indices for training LightGlue
  1. Then you can train RDD in two steps; Descriptor first
# distributed training with 8 gpus
python -m training.train --ckpt_save_path ./ckpt_descriptor --distributed --batch_size 32

# single gpu 
python -m training.train --ckpt_save_path ./ckpt_descriptor

and then Detector

python -m training.train --ckpt_save_path ./ckpt_detector --weights ./ckpt_descriptor/RDD_best.pth --train_detector --training_res 480

I am working on recollecting the Air-to-Ground dataset because of licensing issues.

Citation

@inproceedings{gonglin2025rdd,
    title     = {RDD: Robust Feature Detector and Descriptor using Deformable Transformer},
    author    = {Chen, Gonglin and Fu, Tianwen and Chen, Haiwei and Teng, Wenbin and Xiao, Hanyuan and Zhao, Yajie},
    booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    year      = {2025}
}

License

License

Acknowledgements

We thank these great repositories: ALIKE, LoFTR, DeDoDe, XFeat, LightGlue, Kornia, and Deformable DETR, and many other inspiring works in the community.

LightGlue is trained with Glue Factory.

Supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior/Interior Business Center (DOI/IBC) contract number 140D0423C0075. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DOI/IBC, or the U.S. Government. We would like to thank Yayue Chen for her help with visualization.