mataln commited on
Commit
38bca68
·
verified ·
1 Parent(s): 23e1e8e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -0
README.md CHANGED
@@ -15,6 +15,10 @@ This repository contains information about the multi-modal multi-label, wide are
15
  It contains around 40 TB of co-aligned machine learning ready data tiles, spanning 9 EO datasets and 6 geographic regions. For ease of access, the dataset has been compressed as parquet files.
16
  For a smaller (uncompressed) version of our dataset, check out the [M3LEO miniset](https://huggingface.co/M3LEO-miniset).
17
 
 
 
 
 
18
  ## Tile Definitions
19
 
20
  Each data tile covers an area of 4480m x 4480m (448x448 pixels at 10m/pixel) and is labelled with a unique identifier based on location.
 
15
  It contains around 40 TB of co-aligned machine learning ready data tiles, spanning 9 EO datasets and 6 geographic regions. For ease of access, the dataset has been compressed as parquet files.
16
  For a smaller (uncompressed) version of our dataset, check out the [M3LEO miniset](https://huggingface.co/M3LEO-miniset).
17
 
18
+ # Decompression
19
+ If you need to decompress the files, please see the main README at [the github repo](https://github.com/spaceml-org/M3LEO/).</br>
20
+ If you want to use them directly from the parquet files, the original .tif/.nc files were read into the rows as [binary file data sources](https://spark.apache.org/docs/3.5.3/sql-data-sources-binaryFile.html)
21
+
22
  ## Tile Definitions
23
 
24
  Each data tile covers an area of 4480m x 4480m (448x448 pixels at 10m/pixel) and is labelled with a unique identifier based on location.