license: mit
AnySat: An Earth Observation Model for Any Resolutions, Scales, and Modalities (ArXiv 2024)
Guillaume Astruc, Nicolas Gonthier, Clement Mallet, Loic Landrieu
Official models for AnySat: An Earth Observation Model for Any Resolutions, Scales, and Modalities
Abstract
We introduce AnySat: a JEPA-based multimodal Earth Observation model that train simultaneously on diverse datasets with different scales, resolutions (spatial, spectral, temporal), and modality combinations.
For more details and results, please check out our github and project page.
Datasets
Dataset name | Modalities | Labels | Link |
---|---|---|---|
PASTIS-HD | SPOT 6-7 (1m) + S1/S2 (30-140 / year) | Crop mapping (0.2m) | huggingface or zenodo |
TreeSatAI-TS | Aerial (0.2m) + S1/S2 (10-70 / year) | Forestry (60m) | huggingface |
FLAIR | aerial (0.2m) + S2 (20-114 / year) | Land cover (0.2m) | huggingface |
Inference π₯
In order to load our pretrained models, you can run:
from models.huggingface import AnySat
## Code to use pretrained weights
model = AnySat(size="base", pretrained=True) #Exists also "small" and "tiny"
To get features from an observation of a batch of observations, you need to provide to the model a dictionnary where keys are from the list:
- "aerial": Single date tensor (Bx4xHxW) with 4 channels (RGB NiR), 0.2m resolution
- "aerial-flair": Single date tensor (Bx5xHxW) with 5 channels (RGB NiR Elevation), 0.2m resolution
- "spot": Single date tensor (Bx3xHxW) with 3 channels (RGB), 1m resolution
- "naip": Single date tensor (Bx4xHxW) with 3 channels (RGB), 1.25m resolution
- "s2": Time series tensor (BxTx10xHxW) with 10 channels (B2 B3 B4 B5 B6 B7 B8 B8a B11 B12), 10m resolution
- "s1-asc": Time series tensor (BxTx2xHxW) with 2 channels (VV VH), 10m resolution
- "s1": Time series tensor (BxTx3xHxW) with 3 channels (VV VH Ratio), 10m resolution
- "alos": Time series tensor (BxTx3xHxW) with 3 channels (HH HV Ratio), 30m resolution
- "l7": Time series tensor (BxTx6xHxW) with 6 channels (B1 B2 B3 B4 B5 B7), 30m resolution
- "l8": Time series tensor (BxTx11xHxW) with 11 channels (B8 B1 B2 B3 B4 B5 B6 B7 B9 B10 B11), rescaled to 10m resolution
- "modis": Time series tensor (BxTx7xHxW) with 7 channels (B1 B2 B3 B4 B5 B6 B7), 250m resolution
Time series keys require a "{key}_dates" (for example "s2_dates") tensor of size BxT that value an integer that represent the day of the year. Then you have to choose at which scale you want te produce features. Scale argument is in meters and represent the size of the desired patch size. Outputs will be composed of the concatenation of a class token and a flattened feature map where each feature encodes a scale x scale zone Then, you can run:
features = AnySat(data, scale=scale) #
And then you can apply those features to the desired downstream task!
If you want to get a feature map at the density of a specific modality you can specify:
features = AnySat(data, scale=scale, keep_subpatch=True, modality_keep=modality) #where modality is the name of the desired modality
Note that the features will be of size 2*D. If you have several modalities of the same desired resolution, you should pick the most informative one (or modify the code to concatenante also the other modalities)
To reproduce results, add new modalities, or do more experiments see the full code on github.
Citing π«