This model was released on 2019-11-26 and added to Hugging Face Transformers on 2025-01-20.

SuperGlue

SuperGlue is a neural network that matches two sets of local features by jointly finding correspondences and rejecting non-matchable points. Assignments are estimated by solving a differentiable optimal transport problem, whose costs are predicted by a graph neural network. SuperGlue introduces a flexible context aggregation mechanism based on attention, enabling it to reason about the underlying 3D scene and feature assignments jointly. Paired with the SuperPoint model, it can be used to match two images and estimate the pose between them. This model is useful for tasks such as image matching, homography estimation, etc.

You can find all the original SuperGlue checkpoints under the Magic Leap Community organization.

This model was contributed by stevenbucaille.

Click on the SuperGlue models in the right sidebar for more examples of how to apply SuperGlue to different computer vision tasks.

The example below demonstrates how to match keypoints between two images with the AutoModel class.

AutoModel

Notes

SuperGlue performs feature matching between two images simultaneously, requiring pairs of images as input.

from transformers import AutoImageProcessor, AutoModel
import torch
from PIL import Image
import requests

processor = AutoImageProcessor.from_pretrained("magic-leap-community/superglue_outdoor")
model = AutoModel.from_pretrained("magic-leap-community/superglue_outdoor")

# SuperGlue requires pairs of images
images = [image1, image2]
inputs = processor(images, return_tensors="pt")
outputs = model(**inputs)

# Extract matching information
keypoints0 = outputs.keypoints0  # Keypoints in first image
keypoints1 = outputs.keypoints1  # Keypoints in second image
matches = outputs.matches        # Matching indices
matching_scores = outputs.matching_scores  # Confidence scores

The model outputs matching indices, keypoints, and confidence scores for each match.

For better visualization and analysis, use the SuperGlueImageProcessor.post_process_keypoint_matching() method to get matches in a more readable format.

# Process outputs for visualization
image_sizes = [[(image.height, image.width) for image in images]]
processed_outputs = processor.post_process_keypoint_matching(outputs, image_sizes, threshold=0.2)

for i, output in enumerate(processed_outputs):
    print(f"For the image pair {i}")
    for keypoint0, keypoint1, matching_score in zip(
            output["keypoints0"], output["keypoints1"], output["matching_scores"]
    ):
        print(f"Keypoint at {keypoint0.numpy()} matches with keypoint at {keypoint1.numpy()} with score {matching_score}")

Visualize the matches between the images using the built-in plotting functionality.

# Easy visualization using the built-in plotting method
processor.visualize_keypoint_matching(images, processed_outputs)

Resources

Refer to the original SuperGlue repository for more examples and implementation details.

SuperGlueConfig

class transformers.SuperGlueConfig

< source >

( keypoint_detector_config: SuperPointConfig = None hidden_size: int = 256 keypoint_encoder_sizes: typing.Optional[list[int]] = None gnn_layers_types: typing.Optional[list[str]] = None num_attention_heads: int = 4 sinkhorn_iterations: int = 100 matching_threshold: float = 0.0 initializer_range: float = 0.02 **kwargs )

Parameters

keypoint_detector_config (Union[AutoConfig, dict], optional, defaults to SuperPointConfig) — The config object or dictionary of the keypoint detector.
hidden_size (int, optional, defaults to 256) — The dimension of the descriptors.
keypoint_encoder_sizes (list[int], optional, defaults to [32, 64, 128, 256]) — The sizes of the keypoint encoder layers.
gnn_layers_types (list[str], optional, defaults to ['self', 'cross', 'self', 'cross', 'self', 'cross', 'self', 'cross', 'self', 'cross', 'self', 'cross', 'self', 'cross', 'self', 'cross', 'self', 'cross']) — The types of the GNN layers. Must be either ‘self’ or ‘cross’.
num_attention_heads (int, optional, defaults to 4) — The number of heads in the GNN layers.
sinkhorn_iterations (int, optional, defaults to 100) — The number of Sinkhorn iterations.
matching_threshold (float, optional, defaults to 0.0) — The matching threshold.
initializer_range (float, optional, defaults to 0.02) — The standard deviation of the truncated_normal_initializer for initializing all weight matrices.

This is the configuration class to store the configuration of a SuperGlueModel. It is used to instantiate a SuperGlue model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the SuperGlue magic-leap-community/superglue_indoor architecture.

Configuration objects inherit from PretrainedConfig and can be used to control the model outputs. Read the documentation from PretrainedConfig for more information.

Examples:

>>> from transformers import SuperGlueConfig, SuperGlueModel

>>> # Initializing a SuperGlue superglue style configuration
>>> configuration = SuperGlueConfig()

>>> # Initializing a model from the superglue style configuration
>>> model = SuperGlueModel(configuration)

>>> # Accessing the model configuration
>>> configuration = model.config

SuperGlueImageProcessor

class transformers.SuperGlueImageProcessor

< source >

( do_resize: bool = True size: typing.Optional[dict[str, int]] = None resample: Resampling = <Resampling.BILINEAR: 2> do_rescale: bool = True rescale_factor: float = 0.00392156862745098 do_grayscale: bool = True **kwargs )

Parameters

do_resize (bool, optional, defaults to True) — Controls whether to resize the image’s (height, width) dimensions to the specified size. Can be overridden by do_resize in the preprocess method.
size (dict[str, int] optional, defaults to {"height" -- 480, "width": 640}): Resolution of the output image after resize is applied. Only has an effect if do_resize is set to True. Can be overridden by size in the preprocess method.
resample (PILImageResampling, optional, defaults to Resampling.BILINEAR) — Resampling filter to use if resizing the image. Can be overridden by resample in the preprocess method.
do_rescale (bool, optional, defaults to True) — Whether to rescale the image by the specified scale rescale_factor. Can be overridden by do_rescale in the preprocess method.
rescale_factor (int or float, optional, defaults to 1/255) — Scale factor to use if rescaling the image. Can be overridden by rescale_factor in the preprocess method.
do_grayscale (bool, optional, defaults to True) — Whether to convert the image to grayscale. Can be overridden by do_grayscale in the preprocess method.

Constructs a SuperGlue image processor.

post_process_keypoint_matching

< source >

( outputs: KeypointMatchingOutput target_sizes: typing.Union[transformers.utils.generic.TensorType, list[tuple]] threshold: float = 0.0 ) → list[Dict]

Parameters

outputs (KeypointMatchingOutput) — Raw outputs of the model.
target_sizes (torch.Tensor or list[tuple[tuple[int, int]]], optional) — Tensor of shape (batch_size, 2, 2) or list of tuples of tuples (tuple[int, int]) containing the target size (height, width) of each image in the batch. This must be the original image size (before any processing).
threshold (float, optional, defaults to 0.0) — Threshold to filter out the matches with low scores.

Returns

list[Dict]

A list of dictionaries, each dictionary containing the keypoints in the first and second image of the pair, the matching scores and the matching indices.

Converts the raw output of KeypointMatchingOutput into lists of keypoints, scores and descriptors with coordinates absolute to the original image sizes.

preprocess

< source >

( images do_resize: typing.Optional[bool] = None size: typing.Optional[dict[str, int]] = None resample: Resampling = None do_rescale: typing.Optional[bool] = None rescale_factor: typing.Optional[float] = None do_grayscale: typing.Optional[bool] = None return_tensors: typing.Union[str, transformers.utils.generic.TensorType, NoneType] = None data_format: ChannelDimension = <ChannelDimension.FIRST: 'channels_first'> input_data_format: typing.Union[str, transformers.image_utils.ChannelDimension, NoneType] = None **kwargs )

Parameters

images (ImageInput) — Image pairs to preprocess. Expects either a list of 2 images or a list of list of 2 images list with pixel values ranging from 0 to 255. If passing in images with pixel values between 0 and 1, set do_rescale=False.
do_resize (bool, optional, defaults to self.do_resize) — Whether to resize the image.
size (dict[str, int], optional, defaults to self.size) — Size of the output image after resize has been applied. If size["shortest_edge"] >= 384, the image is resized to (size["shortest_edge"], size["shortest_edge"]). Otherwise, the smaller edge of the image will be matched to int(size["shortest_edge"]/ crop_pct), after which the image is cropped to (size["shortest_edge"], size["shortest_edge"]). Only has an effect if do_resize is set to True.
resample (PILImageResampling, optional, defaults to self.resample) — Resampling filter to use if resizing the image. This can be one of PILImageResampling, filters. Only has an effect if do_resize is set to True.
do_rescale (bool, optional, defaults to self.do_rescale) — Whether to rescale the image values between [0 - 1].
rescale_factor (float, optional, defaults to self.rescale_factor) — Rescale factor to rescale the image by if do_rescale is set to True.
do_grayscale (bool, optional, defaults to self.do_grayscale) — Whether to convert the image to grayscale.
return_tensors (str or TensorType, optional) — The type of tensors to return. Can be one of:
- Unset: Return a list of np.ndarray.
- TensorType.TENSORFLOW or 'tf': Return a batch of type tf.Tensor.
- TensorType.PYTORCH or 'pt': Return a batch of type torch.Tensor.
- TensorType.NUMPY or 'np': Return a batch of type np.ndarray.
- TensorType.JAX or 'jax': Return a batch of type jax.numpy.ndarray.
data_format (ChannelDimension or str, optional, defaults to ChannelDimension.FIRST) — The channel dimension format for the output image. Can be one of:
- "channels_first" or ChannelDimension.FIRST: image in (num_channels, height, width) format.
- "channels_last" or ChannelDimension.LAST: image in (height, width, num_channels) format.
- Unset: Use the channel dimension format of the input image.
input_data_format (ChannelDimension or str, optional) — The channel dimension format for the input image. If unset, the channel dimension format is inferred from the input image. Can be one of:
- "channels_first" or ChannelDimension.FIRST: image in (num_channels, height, width) format.
- "channels_last" or ChannelDimension.LAST: image in (height, width, num_channels) format.
- "none" or ChannelDimension.NONE: image in (height, width) format.

Preprocess an image or batch of images.

resize

< source >

( image: ndarray size: dict data_format: typing.Union[str, transformers.image_utils.ChannelDimension, NoneType] = None input_data_format: typing.Union[str, transformers.image_utils.ChannelDimension, NoneType] = None **kwargs )

Parameters

image (np.ndarray) — Image to resize.
size (dict[str, int]) — Dictionary of the form {"height": int, "width": int}, specifying the size of the output image.
data_format (ChannelDimension or str, optional) — The channel dimension format of the output image. If not provided, it will be inferred from the input image. Can be one of:
- "channels_first" or ChannelDimension.FIRST: image in (num_channels, height, width) format.
- "channels_last" or ChannelDimension.LAST: image in (height, width, num_channels) format.
- "none" or ChannelDimension.NONE: image in (height, width) format.
input_data_format (ChannelDimension or str, optional) — The channel dimension format for the input image. If unset, the channel dimension format is inferred from the input image. Can be one of:
- "channels_first" or ChannelDimension.FIRST: image in (num_channels, height, width) format.
- "channels_last" or ChannelDimension.LAST: image in (height, width, num_channels) format.
- "none" or ChannelDimension.NONE: image in (height, width) format.

Resize an image.

visualize_keypoint_matching

< source >

( images: typing.Union[ForwardRef('PIL.Image.Image'), numpy.ndarray, ForwardRef('torch.Tensor'), list['PIL.Image.Image'], list[numpy.ndarray], list['torch.Tensor']] keypoint_matching_output: list ) → List[PIL.Image.Image]

Parameters

images (ImageInput) — Image pairs to plot. Same as SuperGlueImageProcessor.preprocess. Expects either a list of 2 images or a list of list of 2 images list with pixel values ranging from 0 to 255.
keypoint_matching_output (List[Dict[str, torch.Tensor]]]) — A post processed keypoint matching output

Returns

List[PIL.Image.Image]

A list of PIL images, each containing the image pairs side by side with the detected keypoints as well as the matching between them.

Plots the image pairs side by side with the detected keypoints as well as the matching between them.

preprocess
post_process_keypoint_matching
visualize_keypoint_matching

Pytorch

Hide Pytorch content

SuperGlueForKeypointMatching

class transformers.SuperGlueForKeypointMatching

< source >

( config: SuperGlueConfig )

Parameters

config (SuperGlueConfig) — Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the from_pretrained() method to load the model weights.

SuperGlue model taking images as inputs and outputting the matching of them.

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward

< source >

( pixel_values: FloatTensor labels: typing.Optional[torch.LongTensor] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.models.superglue.modeling_superglue.KeypointMatchingOutput or tuple(torch.FloatTensor)

Parameters

pixel_values (torch.FloatTensor of shape (batch_size, num_channels, image_size, image_size)) — The tensors corresponding to the input images. Pixel values can be obtained using SuperGlueImageProcessor. See SuperGlueImageProcessor.call() for details (processor_class uses SuperGlueImageProcessor for processing images).
labels (torch.LongTensor of shape (batch_size, sequence_length), optional) — Labels for computing the masked language modeling loss. Indices should either be in [0, ..., config.vocab_size] or -100 (see input_ids docstring). Tokens with indices set to -100 are ignored (masked), the loss is only computed for the tokens with labels in [0, ..., config.vocab_size].
output_attentions (bool, optional) — Whether or not to return the attentions tensors of all attention layers. See attentions under returned tensors for more detail.
output_hidden_states (bool, optional) — Whether or not to return the hidden states of all layers. See hidden_states under returned tensors for more detail.
return_dict (bool, optional) — Whether or not to return a ModelOutput instead of a plain tuple.

Returns

transformers.models.superglue.modeling_superglue.KeypointMatchingOutput or tuple(torch.FloatTensor)

A transformers.models.superglue.modeling_superglue.KeypointMatchingOutput or a tuple of torch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (SuperGlueConfig) and inputs.

loss (torch.FloatTensor of shape (1,), optional) — Loss computed during training.
matches (torch.FloatTensor of shape (batch_size, 2, num_matches)) — Index of keypoint matched in the other image.
matching_scores (torch.FloatTensor of shape (batch_size, 2, num_matches)) — Scores of predicted matches.
keypoints (torch.FloatTensor of shape (batch_size, num_keypoints, 2)) — Absolute (x, y) coordinates of predicted keypoints in a given image.
mask (torch.IntTensor of shape (batch_size, num_keypoints)) — Mask indicating which values in matches and matching_scores are keypoint matching information.
hidden_states (tuple[torch.FloatTensor, ...], optional) — Tuple of torch.FloatTensor (one for the output of each stage) of shape (batch_size, 2, num_channels, num_keypoints), returned when output_hidden_states=True is passed or when config.output_hidden_states=True)
attentions (tuple[torch.FloatTensor, ...], optional) — Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, 2, num_heads, num_keypoints, num_keypoints), returned when output_attentions=True is passed or when config.output_attentions=True)

The SuperGlueForKeypointMatching forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Examples:

>>> from transformers import AutoImageProcessor, AutoModel
>>> import torch
>>> from PIL import Image
>>> import requests

>>> url = "https://github.com/magicleap/SuperGluePretrainedNetwork/blob/master/assets/phototourism_sample_images/london_bridge_78916675_4568141288.jpg?raw=true"
>>> image1 = Image.open(requests.get(url, stream=True).raw)
>>> url = "https://github.com/magicleap/SuperGluePretrainedNetwork/blob/master/assets/phototourism_sample_images/london_bridge_19481797_2295892421.jpg?raw=true"
>>> image2 = Image.open(requests.get(url, stream=True).raw)
>>> images = [image1, image2]

>>> processor = AutoImageProcessor.from_pretrained("magic-leap-community/superglue_outdoor")
>>> model = AutoModel.from_pretrained("magic-leap-community/superglue_outdoor")

>>> with torch.no_grad():
>>>     inputs = processor(images, return_tensors="pt")
>>>     outputs = model(**inputs)

forward

< > Update on GitHub