arxiv:2311.00566

CROMA: Remote Sensing Representations with Contrastive Radar-Optical Masked Autoencoders

Published on Nov 1, 2023

Authors:

Abstract

A vital and rapidly growing application, remote sensing offers vast yet sparsely labeled, <PRE_TAG>spatially aligned</POST_TAG> <PRE_TAG>multimodal data</POST_TAG>; this makes self-supervised learning algorithms invaluable. We present CROMA: a framework that combines contrastive and reconstruction self-supervised objectives to learn rich unimodal and <PRE_TAG>multimodal representations</POST_TAG>. Our method separately encodes masked-out multispectral optical and synthetic aperture radar samples -- aligned in space and time -- and performs cross-modal <PRE_TAG>contrastive learning</POST_TAG>. Another encoder fuses these sensors, producing joint <PRE_TAG>multimodal encodings</POST_TAG> that are used to predict the masked patches via a lightweight decoder. We show that these objectives are complementary when leveraged on <PRE_TAG>spatially aligned</POST_TAG> <PRE_TAG>multimodal data</POST_TAG>. We also introduce X- and 2D-ALiBi, which spatially biases our cross- and <PRE_TAG>self-attention matrices</POST_TAG>. These strategies improve representations and allow our models to effectively extrapolate to images up to 17.6x larger at test-time. CROMA outperforms the current SoTA multispectral model, evaluated on: four classification benchmarks -- finetuning (avg. 1.8%), linear (avg. 2.4%) and non<PRE_TAG>linear</POST_TAG> (avg. 1.4%) probing, kNN classification (avg. 3.5%), and K-means clustering (avg. 8.4%); and three segmentation benchmarks (avg. 6.4%). CROMA's rich, optionally <PRE_TAG>multimodal representations</POST_TAG> can be widely leveraged across remote sensing applications.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2311.00566 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2311.00566 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.