--- license: mit --- # OmniSat: Self-Supervised Modality Fusion for Earth Observation (ECCV 2024) [Guillaume Astruc](https://gastruc.github.io/), [Nicolas Gonthier](https://ngonthier.github.io/), [Clement Mallet](https://www.umr-lastig.fr/clement-mallet/), [Loic Landrieu](https://loiclandrieu.com/) Official models for [_OmniSat: Self-Supervised Modality Fusion for Earth Observation_](https://arxiv.org/pdf/2404.08351.pdf) ## Abstract We introduce OmniSat, a novel architecture that exploits the spatial alignment between multiple EO modalities to learn expressive multimodal representations without labels. We demonstrate the advantages of combining modalities of different natures across three downstream tasks (forestry, land cover classification, and crop mapping), and propose two augmented datasets with new modalities: PASTIS-HD and TreeSatAI-TS. For more details and results, please check out our [github](https://github.com/gastruc/OmniSat) and [project page](https://gastruc.github.io/projects/omnisat.html).

## Datasets | Dataset name | Modalities | Labels | Link | ------------- | ---------------------------------------- | ------------------- | ------------------- | | PASTIS-HD | **SPOT 6-7 (1m)** + S1/S2 (30-140 / year)| Crop mapping (0.2m) | [huggingface](https://huggingface.co/datasets/IGNF/PASTIS-HD) or [zenodo](https://zenodo.org/records/10908628) | | TreeSatAI-TS | Aerial (0.2m) + **S1/S2 (10-70 / year)** | Forestry (60m) | [huggingface](https://huggingface.co/datasets/IGNF/TreeSatAI-Time-Series) | | FLAIR | aerial (0.2m) + S2 (20-114 / year) | Land cover (0.2m) | [huggingface](https://huggingface.co/datasets/IGNF/FLAIR) |

### Inference 🔥 In order to load our pretrained models, you can run: ```python from models.huggingface import AnySat ## Code to use pretrained weights model = AnySat(size="base", pretrained=True) #Exists also "small" and "tiny" ``` To get features from an observation of a batch of observations, you need to provide to the model a dictionnary where keys are from the list: - "aerial": Single date tensor (Bx4xHxW) with 4 channels (RGB NiR), 0.2m resolution - "aerial-flair": Single date tensor (Bx5xHxW) with 5 channels (RGB NiR Elevation), 0.2m resolution - "spot": Single date tensor (Bx3xHxW) with 3 channels (RGB), 1m resolution - "naip": Single date tensor (Bx4xHxW) with 3 channels (RGB), 1.25m resolution - "s2": Time series tensor (BxTx10xHxW) with 10 channels (B0,B1???), 10m resolution - "s1-asc": Time series tensor (BxTx2xHxW) with 2 channels (VV VH), 10m resolution - "s1": Time series tensor (BxTx3xHxW) with 3 channels, 10m resolution - "alos": Time series tensor (BxTx3xHxW) with 3 channels, 30m resolution - "l7": Time series tensor (BxTx6xHxW) with 6 channels, 30m resolution - "l8": Time series tensor (BxTx11xHxW) with 11 channels, rescaled to 10m resolution - "modis": Time series tensor (BxTx7xHxW) with 7 channels, 250m resolution Time series keys require a "{key}_dates" (for example "s2_dates") tensor of size BxT that value an integer that represent the day of the year. Then, you can run: ```python features = AnySat(data) ``` And then apply those features to the desired downstream task To reproduce results, add new modalities, or do more experiments see the full code on [github]('https://github.com/gastruc/AnySat'). ### Citing 💫 ```bibtex ```