image-matching-webui

Sleeping

App Files Files Community

image-matching-webui / third_party /omniglue /README.md

Realcat

add: omniglue

2507d2f 7 months ago

preview code

raw

history blame

5.55 kB

	<div align="center">

	# \[CVPR'24\] Code release for OmniGlue(ONNX)

	[![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-sm.svg)](https://huggingface.co/spaces/Realcat/image-matching-webui)

	<p align="center">
	<a href="https://hwjiang1510.github.io/">Hanwen Jiang</a>,
	<a href="https://scholar.google.com/citations?user=jgSItF4AAAAJ">Arjun Karpur</a>,
	<a href="https://scholar.google.com/citations?user=7EeSOcgAAAAJ">Bingyi Cao</a>,
	<a href="https://www.cs.utexas.edu/~huangqx/">Qixing Huang</a>,
	<a href="https://andrefaraujo.github.io/">Andre Araujo</a>
	</p>

	</div>

	--------------------------------------------------------------------------------

	<div align="center">
	<a href="https://hwjiang1510.github.io/OmniGlue/"><strong>Project Page</strong></a> \|
	<a href="https://arxiv.org/abs/2405.12979"><strong>Paper</strong></a> \|
	<a href="#installation"><strong>Usage</strong></a> \|
	<a href="https://huggingface.co/spaces/qubvel-hf/omniglue"><strong>Demo</strong></a>
	</div>

	<br>

	ONNX-compatible release for the CVPR 2024 paper: **OmniGlue: Generalizable Feature
	Matching with Foundation Model Guidance**.

	![og_diagram.png](res/og_diagram.png "og_diagram.png")

	Abstract: The image matching field has been witnessing a continuous
	emergence of novel learnable feature matching techniques, with ever-improving
	performance on conventional benchmarks. However, our investigation shows that
	despite these gains, their potential for real-world applications is restricted
	by their limited generalization capabilities to novel image domains. In this
	paper, we introduce OmniGlue, the first learnable image matcher that is designed
	with generalization as a core principle. OmniGlue leverages broad knowledge from
	a vision foundation model to guide the feature matching process, boosting
	generalization to domains not seen at training time. Additionally, we propose a
	novel keypoint position-guided attention mechanism which disentangles spatial
	and appearance information, leading to enhanced matching descriptors. We perform
	comprehensive experiments on a suite of 6 datasets with varied image domains,
	including scene-level, object-centric and aerial images. OmniGlue’s novel
	components lead to relative gains on unseen domains of 18.8% with respect to a
	directly comparable reference model, while also outperforming the recent
	LightGlue method by 10.1% relatively.


	## Installation

	First, use pip to install `omniglue`:

	```sh
	conda create -n omniglue pip
	conda activate omniglue

	git clone https://github.com/google-research/omniglue.git
	cd omniglue
	pip install -e .
	```

	Then, download the following models to `./models/`

	```sh
	# Download to ./models/ dir.
	mkdir models
	cd models

	# SuperPoint.
	git clone https://github.com/rpautrat/SuperPoint.git
	mv SuperPoint/pretrained_models/sp_v6.tgz . && rm -rf SuperPoint
	tar zxvf sp_v6.tgz && rm sp_v6.tgz

	# DINOv2 - vit-b14.
	wget https://dl.fbaipublicfiles.com/dinov2/dinov2_vitb14/dinov2_vitb14_pretrain.pth

	# OmniGlue.
	wget https://storage.googleapis.com/omniglue/og_export.zip
	unzip og_export.zip && rm og_export.zip
	```

	Direct download links:

	- [[SuperPoint weights]](https://github.com/rpautrat/SuperPoint/tree/master/pretrained_models): from [github.com/rpautrat/SuperPoint](https://github.com/rpautrat/SuperPoint)
	- [[DINOv2 weights]](https://dl.fbaipublicfiles.com/dinov2/dinov2_vitb14/dinov2_vitb14_pretrain.pth): from [github.com/facebookresearch/dinov2](https://github.com/facebookresearch/dinov2) (ViT-B/14 distilled backbone without register).
	- [[OmniGlue weights]](https://storage.googleapis.com/omniglue/og_export.zip)

	## Usage
	The code snippet below outlines how you can perform OmniGlue inference in your
	own python codebase.

	```py

	from src import omniglue

	image0 = ... # load images from file into np.array
	image1 = ...

	og = omniglue.OmniGlue(
	og_export="./models/omniglue.onnx",
	sp_export="./models/sp_v6.onnx",
	dino_export="./models/dinov2_vitb14_pretrain.pth",
	)

	match_kp0s, match_kp1s, match_confidences = og.FindMatches(image0, image1)
	# Output:
	# match_kp0: (N, 2) array of (x,y) coordinates in image0.
	# match_kp1: (N, 2) array of (x,y) coordinates in image1.
	# match_confidences: N-dim array of each of the N match confidence scores.
	```

	## Demo

	`demo.py` contains example usage of the `omniglue` module. To try with your own
	images, replace `./res/demo1.jpg` and `./res/demo2.jpg` with your own
	filepaths.

	```sh
	conda activate omniglue
	python demo.py ./res/demo1.jpg ./res/demo2.jpg
	# <see output in './demo_output.png'>
	```

	Expected output:
	![demo_output.png](res/demo_output.png "demo_output.png")

	Comparison of Results Between TensorFlow and ONNX：
	![result_tf_and_onnx.png](res/result_tf_and_onnx.png "result_tf_and_onnx.png")


	## Repo TODOs

	- ~~Provide `demo.py` example usage script.~~
	- Support matching for pre-extracted features.
	- Release eval pipelines for in-domain (MegaDepth).
	- Release eval pipelines for all out-of-domain datasets.

	## BibTex
	```
	@inproceedings{jiang2024Omniglue,
	title={OmniGlue: Generalizable Feature Matching with Foundation Model Guidance},
	author={Jiang, Hanwen and Karpur, Arjun and Cao, Bingyi and Huang, Qixing and Araujo, Andre},
	booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
	year={2024},
	}
	```

	--------------------------------------------------------------------------------

	This is not an officially supported Google product.