File size: 5,103 Bytes
b3a8d41 cc95845 b3a8d41 72460fe cc95845 32ea925 c215b16 8701a87 32ea925 38bca68 32ea925 1422123 32ea925 1422123 32ea925 b542988 32ea925 1b94412 32ea925 23e1e8e 137e7ee b4f9afb 137e7ee b4f9afb 32ea925 b4f9afb 32ea925 b4f9afb 2198426 32ea925 b4f9afb 32ea925 b4f9afb 32ea925 b4f9afb 32ea925 b4f9afb 32ea925 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
---
title: README
emoji: π
colorFrom: blue
colorTo: blue
sdk: static
pinned: false
---

# M3LEO: A Multi-Modal Multi-Label Earth Observation Dataset
This repository contains information about the multi-modal multi-label, wide area Earth Observation (EO) datasets collated during the [2023 Frontier Development Lab](https://fdleurope.org/fdl-europe-2023).
It contains around 40 TB of co-aligned machine learning ready data tiles, spanning 9 EO datasets and 6 geographic regions. For ease of access, the dataset has been compressed as parquet files.
For a smaller (uncompressed) version of our dataset, check out the [M3LEO miniset](https://huggingface.co/M3LEO-miniset).</br>
[PAPER](https://arxiv.org/abs/2406.04230v2) @ NeurIPS 2024 D&B track
# Decompression
If you need to decompress the files, please see the main README at [the github repo](https://github.com/spaceml-org/M3LEO/).</br>
If you want to use them directly from the parquet files, the original .tif/.nc files were read into the rows as [binary file data sources](https://spark.apache.org/docs/3.5.3/sql-data-sources-binaryFile.html)
## Tile Definitions
Each data tile covers an area of 4480m x 4480m (448x448 pixels at 10m/pixel) and is labelled with a unique identifier based on location.

## Areas of Interest
Our areas of interest (AOIs) span China, Conus, Europe, the Middle East, Pakin, and South America.
Each AOI has a '.geojson' file associated with the geometries and identifiers of each data tile.

## Train-Test-Validation Splits
For each geopgraphic area, we provide '.csv' files with predefined train, test and validation splits that can be used for repeatability and comparability of experiments.
60% of tiles are allocated for training, 20% for validation, and 20% for testing.
## Temporal Coverage
As of now, M3LEO contains data from 2018 - 2020 for SAR amplitude and multi-spectral Sentinel-2 imagery. Other datasets are provided for 2020 only. Future iterations might extend the dataset to other years.
## Datasets
The M3LEO dataset spans 9 diverse EO data types, covering input EO imagery and associated labels.

#### Synthetic Aperture Radar Datasets
- [`s1grd-[2018-2020]`](https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S1_GRD): Sentinel-1 C-band SAR Ground Range Detected (GRD) amplitude data. Contains VV and VH polarization channels at 10m resolution, with seasonal median composites (4 seasons/year) for ascending and descending orbits.
- [`gssic`](https://asf.alaska.edu/datasets/derived/global-seasonal-sentinel-1-interferometric-coherence-and-backscatter-dataset/): Global Seasonal Sentinel-1 Interferometric Coherence and Backscatter dataset at 90m resolution.
- [`gunw-dateinit_dateend`](https://asf.alaska.edu/data-sets/derived-data-sets/sentinel-1-interferograms/): ARIA Sentinel-1 Geocoded Unwrapped Interferograms at 90m resolution. Date selection prioritizes maximum interferometric pair availability within the specified period [dateinit, dateend].
#### Optical Imagery
- [`s2srm-[2018-2020]`](https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S2_SR_HARMONIZED): Harmonized Sentinel-2 Level 2A surface reflectance data at 10m resolution. Includes 10 spectral bands (B2, B3, B4, B5, B6, B7, B8, B8A, B11, B12) as cloudless monthly medians for March, June, September, and December.
#### Labeled Datasets
- [`biomass-2020`](https://climate.esa.int/en/projects/biomass/): ESA Climate Change Initiative (CCI) Above Ground Biomass annual maps at 90m resolution.
- [`esaworldcover-2020`](https://developers.google.com/earth-engine/datasets/catalog/ESA_WorldCover_v100): ESA World Cover land use/land cover maps at 10m resolution.
- [`modis44b006veg`](https://developers.google.com/earth-engine/datasets/catalog/MODIS_006_MOD44B): MODIS Vegetation Continuous Fields (VCF) annual maps at 250m resolution.
- [`ghsbuilts-2020`](https://human-settlement.emergency.copernicus.eu/download.php?ds=bu): European Commission JRC Global Human Settlement Layer (GHSL) Built-up Surface dataset at 100m resolution.
#### Digital Elevation Model
- [`srtmdem`](https://developers.google.com/earth-engine/datasets/catalog/CGIAR_SRTM90_V4): NASA Shuttle Radar Topography Mission (SRTM) digital elevation model at 30m resolution.
# Acknowledgements<br>
This work has been enabled by [Frontier Development Lab Europe](https://fdleurope.org) a public / private partnership between the European Space Agency (ESA), Trillium Technologies, the University of Oxford and leaders in commercial AI supported by Google Cloud and NVIDIA, developing open science for all Humankind.
|