ITI / README.md
aespaldi's picture
Update README.md (#4)
2d532e5 verified
|
raw
history blame
11 kB
metadata
license: apache-2.0
language:
  - en

ITI: Instrument to Instrument Translation

Overview

The ‘ITI’ (Instrument to Instrument Translation) tool supports instrument- to-instrument translation of images across different domains using Artificial Intelligence (AI). The ITI tool documentation with example notebooks is available on ITI documentation as well as on GitHub.

Heliophysics Use Cases:

ITI has been applied to eight different Heliophysics use cases and implements a framework for loading different solar observatory data, including

  1. SDO/AIA+HMI:
  2. Hinode/SOT:
  3. SoHo/EIT+MDI:
  4. STEREO/EUVI:
  5. KSO (H-alpha):
  6. EUI/FSI to SDO/AIA: Combining both missions into a unified data set facilitates co-temporal usage, enabling study of solar events from multiple viewpoints.
  7. EUI/HRI to SDO/AIA: Super-resolution
  8. PROBA2 SWAP to SDO/AIA: Enhanced SWAP observations provide detailed observations at full resolution, but also at smaller scales.

Heliophysics Downstream Users:

  1. Image enhancement. With technological improvements, the quality of space-based telescopes has improved in recent years. However, for solar variability studies we need joint and dataproducts. With unpaired image-to-image translation, ITI can restore and enhance solar observations for combined usage.
  2. Instrument intercalibration. Multi-viewpoint observations are necessary for detailed studies of solar phenomena. The intercalibration of multiple instruments enables combined studies from different viewpoints, e.g. Solar Orbiter and SDO.
  3. Super-resolution. In order to study small scale features, high spatial resolution observations are needed. By translating SDO/AIA observation to HRI observations in the perihelion, we can provide full Sun super-resolved observations in the SDO perspective.

Heliopysics models

We provide 4 trained models for heliophysics case studies, enabling image enhancement and instrument intercalibration for:

  • SOHO/EIT to SDO/AIA
  • STEREO/EUVI to SDO/AIA
  • PROBA2/SWAP to SDO/AIA
  • Solar Orbiter/EUI to SDO/AIA

Earth Observation Use Cases:

ITI has been extended to Earth Observation use cases, mainly the intercalibration of geostationary satellites:

  • Meteosat Second Generation Spinning Enhanced Visible and Infrared Imager (MSG/SEVIRI) to Geostationary Operational Environmental Advanced Baseline Imager (GOES/ABI)

Earth Observation Downstream Users:

  1. Feature Extraction/Foundation Models. The rs-tools package provides analysis-ready data (clean, harmonized scenes) and ml-ready data (clean patches) which can easily be applied for a wide range of use cases involving Level-1 data. Users can try to train their own general feature extractors (i.e., foundation models) or, once these are already trained, they can take the pretrained weights to be applied for other downstream tasks like classification, segmentation or estimation. As an example, rs-tools has been successfully applied to preprocess data for 3D cloud reconstruction, through the 2024 Earth Systems Lab 3D Cloud project.
  2. Intercalibration. Intercalibration of geostationary satellites would enable creating (near-)global datasets of the atmosphere that could be used to study cloud evolution at high temporal resolution across the Earth. This could be useful for a variety of research areas, including the Cloud/Aerosol community, and would provide more robust observational constraints for global (high-resolution) climate models.
  3. Super-resolution Similar to the heliophysics applications, ITI can also be used to super-resolution, considering for instance the 1 km resolution of GOES/ABI, compared to the 3 km resolution of MSG/SEVIRI. While this hasn't been explored in detail yet, our framework directly supports this application.

Earth Observation Models

We provide a training specification for the outlined EO-specific use case. For users interested in performing their own training, we provide configuration files on GitHub for easy experimentation.

Model training is still ongoing. Once finished, we will showcase logged callbacks including loss curves, metrics and images of predictions during the training phase.

  • MSG/SEVIRI to GOES/ABI (infrared channels)
  • MSG/SEVIRI to GOES/ABI (all channels)
  • GOES/ABI to MSG/SEVIRI (infrared channels)
  • GOES/ABI to MSG/SEVIRI (all channels)

Software Design and Development

The software for the ITI tool is configured to work with use cases in Heliophysics and Earth Observation; following standard software engineering practices with the following components: a) data ingestion, preprocessing, and analysis-ready data, b) model training, c) model deployment and inference, d) visualization, e) validation and analysis, and f) documentation and tutorials.

Data Ingestion, Preprocessing and Analysis-Ready Data

Heliophysics:

The Heliophysics pipeline provides download routines, preprocessing editors, ML-ready datasets, training and evaluation modules for all instruments and case studies used so far.

Downloading: Python scripts to download data from an external database, including custom data downloaders such as the ITI data-download script for the SDO module for all use cases. We also have some publicly available test data which will allow users to play around with a subset of the datasets available without downloading the entire database.

Preprocessing: A large portion of these preprocessing functions operate on custom .fits structures and/or numpy arrays. For the .fits structures, we make use of sunpy functions which can manipulate the .fits-file using domain-specific transformations, e.g. normalization, header manipulation, and/or coordinate transformations. For the numpy arrays, these are more ML-esque transformations like normalization, padding and/or nan-removal. A significant portion of this has already been done within the ITI repo (see the editor.py module). PyTorch DataLoaders and PyTorch-Lightning DataModules, e.g., see SDO example within the ITI dataset.py module.

Analysis Ready Data: The ML-ready datasets consist of stacked pre-processing editors, which are depending on the instrument one is using. The dataloaders are built on PyTorch Lightning, which simplifies the training process.

Earth Observation:

The EO modular preprocessing pipeline contains tools to download, geoprocess, and patch Earth Observation satellite data. All components are implemented and tested for the Geostationary Operational Environmental Satellites (GOES), the Meteosat Second Generation (MSG) satellites, and - as a third example - the TERRA and AQUA polar-orbiting satellites.

Downloading: Our downloaders build on existing packages like goes-2-go, EarthAccess and eumdac. For each downloader, we added functionality to select specific dates and times, and allow filtering for measurements conducted during certain times of the day. This is particularly helpful for filtering day- and nighttime measurements. In addition to downloading Level-1 data, all downloaders have been expanded and tested for downloading cloud masks. We would like to note that these are general purpose downloaders which can be used to download any of the datasets available in the aforementioned databases which allows users to easily extend for their use cases.

Geoprocessing: Our geoprocessing pipeline reads in raw data files, and performs the following operations: match data & cloudmask timestamps, load radiances (or other relevant calibrations, e.g., reflectance, counts, brightness temperature), stack data channels, attach latitude / longitude information in native coordinate reference system, attach cloudmask, attach band wavelengths, attach measurement time. Our geoprocessing pipeline outputs standardized netcdf, zarr, or tiff datafiles that contain coordinate information, stacked band data, cloudmasks, and relevant metadata.

Patching: Our patching operator takes in standardized netcdf files and creates unpaired, machine learning ready data patches that can be saved in either netcdf, tiff or numpy format. The patching is done through progressive striding over the input data.

Integration into the ITI tool: We developed a GeoDataset and editors that plug into the existing ITI Dataloader. Our GeoDataset loads our geoprocessed and standardized netcdf, tiff or numpy files, and returns a dictionary of coords, data, cloudmask, and wavelengths. All our editors process these dictionaries and perform operations on relevant keys.

Model Training

We provide a set of detailed training scripts which 1) initialize the LightningAI DataModule based on the specification of the dataset and appropriate preprocessing routines, 2) build a LightningAI Learner module based on the specified model architecture, loss, and optimizer, 3) train a model until convergence, 4) save the model weights for later use in Weights and Biases (wandb). This provides the user with a detailed specification of the end-to-end process from data to trained model. This portion is application agnostic but we provide the model configurations and specifications used in a successful demonstration for the user case. It features a detailed config file (.yaml) for the individual (hyper-)parameters of the data, preprocessing, model specification, optimizer, loss function, training, callbacks, hardware specifications and postprocessing routines. Many ML researchers and experts alike will find this useful because it will provide a reproducible specification to establish some baselines. We use wandb as a logging platform to showcase our training procedure and saved weights that are available to other users.

Model Deployment and Inference

We provide a set of detailed inference scripts which will allow users to download previously trained models from specified use cases and apply them on an evaluation (or completely different) dataset. This will 1) load a LightningAI data module for evaluation, 2) load a pre-trained LightningAI learner based on a config + weights provided by wandb, 3) perform inference on the evaluation dataset, and 4) save the results to an appropriate output file (e.g. FITS or TIFF). This will provide the user with a detailed specification of the end-to-end process from data to predictions with a pretrained model. This process will also provide API access to inference endpoints for deployed trained models.

Feedback

Your feedback is invaluable to us. If you have any feedback about the model, please feel free to share it with us. You can do this by submitting issues on our open-source repository, https://github.com/spaceml-org/InstrumentToInstrument.

Citation

If this model helped your research, please cite ITI in your publications.