File size: 6,197 Bytes
ab687e7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
================
pytorch-caney
================

Python package for lots of Pytorch tools for geospatial science problems.

.. image:: https://zenodo.org/badge/472450059.svg
      :target: https://zenodo.org/badge/latestdoi/472450059

Objectives
------------

- Library to process remote sensing imagery using GPU and CPU parallelization.
- Machine Learning and Deep Learning image classification and regression.
- Agnostic array and vector-like data structures.
- User interface environments via Notebooks for easy to use AI/ML projects.
- Example notebooks for quick AI/ML start with your own data.

Installation
----------------

The following library is intended to be used to accelerate the development of data science products
for remote sensing satellite imagery, or any other applications. pytorch-caney can be installed
by itself, but instructions for installing the full environments are listed under the requirements
directory so projects, examples, and notebooks can be run.

Note: PIP installations do not include CUDA libraries for GPU support. Make sure NVIDIA libraries
are installed locally in the system if not using conda/mamba.

.. code-block:: bash

    module load singularity # if a module needs to be loaded
    singularity build --sandbox pytorch-caney-container docker://nasanccs/pytorch-caney:latest


Why Caney?
---------------

"Caney" means longhouse in Taíno.

Contributors
-------------

- Jordan Alexis Caraballo-Vega, [email protected]
- Caleb Spradlin, [email protected]
- Jian Li, [email protected]

Contributing
-------------

Please see our `guide for contributing to pytorch-caney <CONTRIBUTING.md>`_.

SatVision
------------

+---------------+--------------+------------+------------+
| Name          | Pretrain     | Resolution | Parameters |
+===============+==============+============+============+
| SatVision-B   | MODIS-1.9-M  | 192x192    | 84.5M      |
+---------------+--------------+------------+------------+

SatVision Datasets
-----------------------

+---------------+-----------+------------+-------------+
| Name          | Bands     | Resolution | Image Chips |
+===============+===========+============+=============+
| MODIS-Small   | 7         | 128x128    | 1,994,131   |
+---------------+-----------+------------+-------------+

MODIS Surface Reflectance (MOD09GA) Band Details
------------------------------------------------------

+-----------------+---------------+
| Band Name       | Bandwidth     |
+=================+===============+
| sur_refl_b01_1  | 0.620 - 0.670 |
+-----------------+---------------+
| sur_refl_b02_1  | 0.841 - 0.876 |
+-----------------+---------------+
| sur_refl_b03_1  | 0.459 - 0.479 |
+-----------------+---------------+
| sur_refl_b04_1  | 0.545 - 0.565 |
+-----------------+---------------+
| sur_refl_b05_1  | 1.230 - 1.250 |
+-----------------+---------------+
| sur_refl_b06_1  | 1.628 - 1.652 |
+-----------------+---------------+
| sur_refl_b07_1  | 2.105 - 2.155 |
+-----------------+---------------+

Pre-training with Masked Image Modeling
-----------------------------------------

To pre-train the swinv2 base model with masked image modeling pre-training, run:

.. code-block:: bash

    torchrun --nproc_per_node <NGPUS> pytorch-caney/pytorch_caney/pipelines/pretraining/mim.py --cfg <config-file> --dataset <dataset-name> --data-paths <path-to-data-subfolder-1> --batch-size <batch-size> --output <output-dir> --enable-amp

For example to run on a compute node with 4 GPUs and a batch size of 128 on the MODIS SatVision pre-training dataset with a base swinv2 model, run:

.. code-block:: bash

    singularity shell --nv -B <mounts> /path/to/container/pytorch-caney-container
    Singularity> export PYTHONPATH=$PWD:$PWD/pytorch-caney
    Singularity> torchrun --nproc_per_node 4 pytorch-caney/pytorch_caney/pipelines/pretraining/mim.py --cfg pytorch-caney/examples/satvision/mim_pretrain_swinv2_satvision_base_192_window12_800ep.yaml --dataset MODIS --data-paths /explore/nobackup/projects/ilab/data/satvision/pretraining/training_* --batch-size 128 --output . --enable-amp


This example script runs the exact configuration used to make the SatVision-base model pre-training with MiM and the MODIS pre-training dataset.

.. code-block:: bash

    singularity shell --nv -B <mounts> /path/to/container/pytorch-caney-container
    Singularity> cd pytorch-caney/examples/satvision
    Singularity> ./run_satvision_pretrain.sh


Fine-tuning Satvision-base
-----------------------------

To fine-tune the satvision-base pre-trained model, run:

.. code-block:: bash

    torchrun --nproc_per_node <NGPUS> pytorch-caney/pytorch_caney/pipelines/finetuning/finetune.py --cfg <config-file> --pretrained <path-to-pretrained> --dataset <dataset-name> --data-paths <path-to-data-subfolder-1> --batch-size <batch-size> --output <output-dir> --enable-amp

See example config files pytorch-caney/examples/satvision/finetune_satvision_base_*.yaml to see how to structure your config file for fine-tuning.


Testing
------------

For unittests, run this bash command to run linting and unit test runs. This will execute unit tests and linting in a temporary venv environment only used for testing.

.. code-block:: bash

    git clone [email protected]:nasa-nccs-hpda/pytorch-caney.git
    cd pytorch-caney; bash test.sh


or run unit tests directly with container or anaconda env

.. code-block:: bash

    git clone [email protected]:nasa-nccs-hpda/pytorch-caney.git
    singularity build --sandbox pytorch-caney-container docker://nasanccs/pytorch-caney:latest
    singularity shell --nv -B <mounts> /path/to/container/pytorch-caney-container
    cd pytorch-caney; python -m unittest discover pytorch_caney/tests

.. code-block:: bash

    git clone [email protected]:nasa-nccs-hpda/pytorch-caney.git
    cd pytorch-caney; conda env create -f requirements/environment_gpu.yml;
    conda activate pytorch-caney
    python -m unittest discover pytorch_caney/tests


References
------------

- `Pytorch Lightning <https://github.com/Lightning-AI/lightning>`_ 
- `Swin Transformer <https://github.com/microsoft/Swin-Transformer>`_ 
- `SimMIM <https://github.com/microsoft/SimMIM>`_