metadata

license: openrail
datasets:
  - DarthReca/crisislandmark
language:
  - en
library_name: torchgeo
tags:
  - remote-sensing
  - text-to-image-retrieval
  - multimodal
  - geospatial
  - SAR
  - multispectral
  - crisis-management
  - earth-observation
  - contrastive-learning

CLOSP

CLOSP (Contrastive Language Optical SAR Pretraining) is a multimodal architecture designed for text-to-image retrieval. It creates a unified embedding space for text, Sentinel-2 (MSI), and Sentinel-1 (SAR) data.

This repository contains all the separate visual encoders in PyTorch format.

Model Details

The model uses three separate encoders: one for text, one for Sentinel-1 (SAR) data, and one for Sentinel-2 (MSI) data. During training, it uses a contrastive objective to align the textual embeddings with the corresponding visual embeddings (either SAR or MSI).

Developed by: Daniele Rege Cambrin
Model type: CLOSP
Language(s) (NLP): english
License: OpenRAIL
Repository: GitHub
Paper: ArXiv

Citation

@misc{cambrin2025texttoremotesensingimageretrievalrgbsources,
      title={Text-to-Remote-Sensing-Image Retrieval beyond RGB Sources}, 
      author={Daniele Rege Cambrin and Lorenzo Vaiani and Giuseppe Gallipoli and Luca Cagliero and Paolo Garza},
      year={2025},
      eprint={2507.10403},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2507.10403}, 
}