update dataset processing

Browse files

Files changed (4) hide show

README.md +145 -0
configs/metadata.json +2 -1
docs/README.md +138 -0
scripts/data_process.py +74 -0

README.md ADDED Viewed

	@@ -0,0 +1,145 @@

+---
+tags:
+- monai
+- medical
+library_name: monai
+license: apache-2.0
+---
+# Description
+A pre-trained model for the endoscopic inbody classification task.
+# Model Overview
+This model is trained using the SEResNet50 structure, whose details can be found in [1]. All datasets are from private samples of [Activ Surgical](https://www.activsurgical.com/). Samples in training and validation dataset are from the same 4 videos, while test samples are from different two videos.
+The [pytorch model](https://drive.google.com/file/d/14CS-s1uv2q6WedYQGeFbZeEWIkoyNa-x/view?usp=sharing) and [torchscript model](https://drive.google.com/file/d/1fOoJ4n5DWKHrt9QXTZ2sXwr9C-YvVGCM/view?usp=sharing) are shared in google drive. Modify the `bundle_root` parameter specified in `configs/train.json` and `configs/inference.json` to reflect where models are downloaded. Expected directory path to place downloaded models is `models/` under `bundle_root`.
+## Data
+Datasets used in this work were provided by [Activ Surgical](https://www.activsurgical.com/). Here is a [link](https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/inbody_outbody_samples.zip) of 20 samples (10 in-body and 10 out-body) to show what this dataset looks like. After downloading this dataset, python script in `scripts` folder naming `data_process` can be used to get label json files by running the command below and replacing datapath and outpath parameters.
+```
+python scripts/data_process.py --datapath /path/to/data/root --outpath /path/to/label/folder
+```
+After generating label files, please modify the `dataset_dir` parameter specified in `configs/train.json` and `configs/inference.json` to reflect where label files are.
+The input label json should be a list made up by dicts which includes `image` and `label` keys. An example format is shown below.
+```
+[
+    {
+        "image":"/path/to/image/image_name0.jpg",
+        "label": 0
+    },
+    {
+        "image":"/path/to/image/image_name1.jpg",
+        "label": 0
+    },
+    {
+        "image":"/path/to/image/image_name2.jpg",
+        "label": 1
+    },
+    ....
+    {
+        "image":"/path/to/image/image_namek.jpg",
+        "label": 0
+    },
+]
+```
+## Training configuration
+The training was performed with an at least 12GB-memory GPU.
+Actual Model Input: 256 x 256 x 3
+## Input and output formats
+Input: 3 channel video frames
+Output: probability vector whose length equals to 2: Label 0: in body; Label 1: out body
+## Scores
+This model achieves the following accuracy score on the test dataset:
+Accuracy = 0.98
+## commands example
+Execute training:
+```
+python -m monai.bundle run training \
+    --meta_file configs/metadata.json \
+    --config_file configs/train.json \
+    --logging_file configs/logging.conf
+```
+Override the `train` config to execute multi-GPU training:
+```
+torchrun --standalone --nnodes=1 --nproc_per_node=2 -m monai.bundle run training \
+    --meta_file configs/metadata.json \
+    --config_file "['configs/train.json','configs/multi_gpu_train.json']" \
+    --logging_file configs/logging.conf
+```
+Please note that the distributed training related options depend on the actual running environment, thus you may need to remove `--standalone`, modify `--nnodes` or do some other necessary changes according to the machine you used.
+Please refer to [pytorch's official tutorial](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html) for more details.
+Override the `train` config to execute evaluation with the trained model:
+```
+python -m monai.bundle run evaluating \
+    --meta_file configs/metadata.json \
+    --config_file "['configs/train.json','configs/evaluate.json']" \
+    --logging_file configs/logging.conf
+```
+Execute inference:
+```
+python -m monai.bundle run evaluating \
+    --meta_file configs/metadata.json \
+    --config_file configs/inference.json \
+    --logging_file configs/logging.conf
+```
+Export checkpoint to TorchScript file:
+```
+python -m monai.bundle ckpt_export network_def \
+    --filepath models/model.ts \
+    --ckpt_file models/model.pt \
+    --meta_file configs/metadata.json \
+    --config_file configs/inference.json
+```
+Export checkpoint to onnx file, which has been tested on pytorch 1.12.0:
+```
+python scripts/export_to_onnx.py --model models/model.pt --outpath models/model.onnx
+```
+Export TensorRT float16 model from the onnx model:
+```
+trtexec --onnx=models/model.onnx --saveEngine=models/model.trt --fp16 \
+    --minShapes=INPUT__0:1x3x256x256 \
+    --optShapes=INPUT__0:16x3x256x256 \
+    --maxShapes=INPUT__0:32x3x256x256 \
+    --shapes=INPUT__0:8x3x256x256
+```
+This command need TensorRT with correct CUDA installed in the environment. For the detail of installing TensorRT, please refer to [this link](https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html).
+# References
+[1] J. Hu, L. Shen and G. Sun, Squeeze-and-Excitation Networks, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 7132-7141. https://arxiv.org/pdf/1709.01507.pdf
+# License
+Copyright (c) MONAI Consortium
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+    http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.

configs/metadata.json CHANGED Viewed

@@ -1,7 +1,8 @@
 {
     "schema": "https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/meta_schema_20220324.json",
-    "version": "0.2.2",
     "changelog": {
         "0.2.2": "update to use monai 1.0.1",
         "0.2.1": "enhance readme on commands example",
         "0.2.0": "update license files",

 {
     "schema": "https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/meta_schema_20220324.json",
+    "version": "0.3.0",
     "changelog": {
+        "0.3.0": "update dataset processing",
         "0.2.2": "update to use monai 1.0.1",
         "0.2.1": "enhance readme on commands example",
         "0.2.0": "update license files",

docs/README.md ADDED Viewed

	@@ -0,0 +1,138 @@

+# Description
+A pre-trained model for the endoscopic inbody classification task.
+# Model Overview
+This model is trained using the SEResNet50 structure, whose details can be found in [1]. All datasets are from private samples of [Activ Surgical](https://www.activsurgical.com/). Samples in training and validation dataset are from the same 4 videos, while test samples are from different two videos.
+The [pytorch model](https://drive.google.com/file/d/14CS-s1uv2q6WedYQGeFbZeEWIkoyNa-x/view?usp=sharing) and [torchscript model](https://drive.google.com/file/d/1fOoJ4n5DWKHrt9QXTZ2sXwr9C-YvVGCM/view?usp=sharing) are shared in google drive. Modify the `bundle_root` parameter specified in `configs/train.json` and `configs/inference.json` to reflect where models are downloaded. Expected directory path to place downloaded models is `models/` under `bundle_root`.
+## Data
+Datasets used in this work were provided by [Activ Surgical](https://www.activsurgical.com/). Here is a [link](https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/inbody_outbody_samples.zip) of 20 samples (10 in-body and 10 out-body) to show what this dataset looks like. After downloading this dataset, python script in `scripts` folder naming `data_process` can be used to get label json files by running the command below and replacing datapath and outpath parameters.
+```
+python scripts/data_process.py --datapath /path/to/data/root --outpath /path/to/label/folder
+```
+After generating label files, please modify the `dataset_dir` parameter specified in `configs/train.json` and `configs/inference.json` to reflect where label files are.
+The input label json should be a list made up by dicts which includes `image` and `label` keys. An example format is shown below.
+```
+[
+    {
+        "image":"/path/to/image/image_name0.jpg",
+        "label": 0
+    },
+    {
+        "image":"/path/to/image/image_name1.jpg",
+        "label": 0
+    },
+    {
+        "image":"/path/to/image/image_name2.jpg",
+        "label": 1
+    },
+    ....
+    {
+        "image":"/path/to/image/image_namek.jpg",
+        "label": 0
+    },
+]
+```
+## Training configuration
+The training was performed with an at least 12GB-memory GPU.
+Actual Model Input: 256 x 256 x 3
+## Input and output formats
+Input: 3 channel video frames
+Output: probability vector whose length equals to 2: Label 0: in body; Label 1: out body
+## Scores
+This model achieves the following accuracy score on the test dataset:
+Accuracy = 0.98
+## commands example
+Execute training:
+```
+python -m monai.bundle run training \
+    --meta_file configs/metadata.json \
+    --config_file configs/train.json \
+    --logging_file configs/logging.conf
+```
+Override the `train` config to execute multi-GPU training:
+```
+torchrun --standalone --nnodes=1 --nproc_per_node=2 -m monai.bundle run training \
+    --meta_file configs/metadata.json \
+    --config_file "['configs/train.json','configs/multi_gpu_train.json']" \
+    --logging_file configs/logging.conf
+```
+Please note that the distributed training related options depend on the actual running environment, thus you may need to remove `--standalone`, modify `--nnodes` or do some other necessary changes according to the machine you used.
+Please refer to [pytorch's official tutorial](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html) for more details.
+Override the `train` config to execute evaluation with the trained model:
+```
+python -m monai.bundle run evaluating \
+    --meta_file configs/metadata.json \
+    --config_file "['configs/train.json','configs/evaluate.json']" \
+    --logging_file configs/logging.conf
+```
+Execute inference:
+```
+python -m monai.bundle run evaluating \
+    --meta_file configs/metadata.json \
+    --config_file configs/inference.json \
+    --logging_file configs/logging.conf
+```
+Export checkpoint to TorchScript file:
+```
+python -m monai.bundle ckpt_export network_def \
+    --filepath models/model.ts \
+    --ckpt_file models/model.pt \
+    --meta_file configs/metadata.json \
+    --config_file configs/inference.json
+```
+Export checkpoint to onnx file, which has been tested on pytorch 1.12.0:
+```
+python scripts/export_to_onnx.py --model models/model.pt --outpath models/model.onnx
+```
+Export TensorRT float16 model from the onnx model:
+```
+trtexec --onnx=models/model.onnx --saveEngine=models/model.trt --fp16 \
+    --minShapes=INPUT__0:1x3x256x256 \
+    --optShapes=INPUT__0:16x3x256x256 \
+    --maxShapes=INPUT__0:32x3x256x256 \
+    --shapes=INPUT__0:8x3x256x256
+```
+This command need TensorRT with correct CUDA installed in the environment. For the detail of installing TensorRT, please refer to [this link](https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html).
+# References
+[1] J. Hu, L. Shen and G. Sun, Squeeze-and-Excitation Networks, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 7132-7141. https://arxiv.org/pdf/1709.01507.pdf
+# License
+Copyright (c) MONAI Consortium
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+    http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.

scripts/data_process.py ADDED Viewed

	@@ -0,0 +1,74 @@

+import argparse
+import json
+import os
+train_rate = 0.6
+val_rate = 0.2
+test_rate = 0.2
+def save_json(content, path, filename):
+    if not os.path.exists(path):
+        os.makedirs(path, exist_ok=True)
+    dst_file_name = os.path.join(path, filename)
+    with open(dst_file_name, "w+") as fp:
+        json.dump(content, fp, indent=4, separators=(",", ":"))
+def generate_labels(data_path, output_path):
+    """
+    Loading a model by name.
+    Args:
+        data_path: path to classification dataset, which must contain `inbody` and `outbody` directories.
+        output_path: path to save labels
+    """
+    data_list = [os.path.join(root, x) for root, _, filenames in os.walk(data_path) for x in filenames if "jpg" in x]
+    label_list = [int("outbody" in os.path.basename(os.path.dirname(x))) for x in data_list]
+    data_label_json = [{"image": x, "label": y} for x, y in zip(data_list, label_list)]
+    inbody_list = list(filter(lambda x: x["label"] == 0, data_label_json))
+    outbody_list = list(filter(lambda x: not (x["label"] == 0), data_label_json))
+    inbody_train_len = int(len(inbody_list) * train_rate)
+    outbody_train_len = int(len(outbody_list) * train_rate)
+    inbody_val_len = int(len(inbody_list) * (train_rate + val_rate))
+    outbody_val_len = int(len(outbody_list) * (train_rate + val_rate))
+    inbody_train_list = inbody_list[:inbody_train_len]
+    outbody_train_list = outbody_list[:outbody_train_len]
+    inbody_val_list = inbody_list[inbody_train_len:inbody_val_len]
+    outbody_val_list = outbody_list[outbody_train_len:outbody_val_len]
+    inbody_test_list = inbody_list[inbody_val_len:]
+    outbody_test_list = outbody_list[outbody_val_len:]
+    train_list = inbody_train_list + outbody_train_list
+    val_list = inbody_val_list + outbody_val_list
+    test_list = inbody_test_list + outbody_test_list
+    save_json(train_list, out_path, "train.json")
+    save_json(val_list, out_path, "val.json")
+    save_json(test_list, out_path, "test.json")
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    # path to downloaded dataset.
+    parser.add_argument(
+        "--datapath",
+        type=str,
+        default=r"/workspace/data/endoscopic_inbody_classification",
+        help="Input an existing model weight",
+    )
+    # path to save label json.
+    parser.add_argument(
+        "--outpath",
+        type=str,
+        default=r"/workspace/data/endoscopic_inbody_classification",
+        help="A path to save the onnx model.",
+    )
+    args = parser.parse_args()
+    data_path = args.datapath
+    out_path = args.outpath
+    if not os.path.exists(out_path):
+        os.makedirs(out_path, exist_ok=True)
+    generate_labels(data_path, out_path)