monai
medical
katielink commited on
Commit
9a5a418
·
1 Parent(s): e4ba616

update dataset processing

Browse files
Files changed (4) hide show
  1. README.md +145 -0
  2. configs/metadata.json +2 -1
  3. docs/README.md +138 -0
  4. scripts/data_process.py +74 -0
README.md ADDED
@@ -0,0 +1,145 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - monai
4
+ - medical
5
+ library_name: monai
6
+ license: apache-2.0
7
+ ---
8
+ # Description
9
+ A pre-trained model for the endoscopic inbody classification task.
10
+
11
+ # Model Overview
12
+ This model is trained using the SEResNet50 structure, whose details can be found in [1]. All datasets are from private samples of [Activ Surgical](https://www.activsurgical.com/). Samples in training and validation dataset are from the same 4 videos, while test samples are from different two videos.
13
+ The [pytorch model](https://drive.google.com/file/d/14CS-s1uv2q6WedYQGeFbZeEWIkoyNa-x/view?usp=sharing) and [torchscript model](https://drive.google.com/file/d/1fOoJ4n5DWKHrt9QXTZ2sXwr9C-YvVGCM/view?usp=sharing) are shared in google drive. Modify the `bundle_root` parameter specified in `configs/train.json` and `configs/inference.json` to reflect where models are downloaded. Expected directory path to place downloaded models is `models/` under `bundle_root`.
14
+
15
+ ## Data
16
+ Datasets used in this work were provided by [Activ Surgical](https://www.activsurgical.com/). Here is a [link](https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/inbody_outbody_samples.zip) of 20 samples (10 in-body and 10 out-body) to show what this dataset looks like. After downloading this dataset, python script in `scripts` folder naming `data_process` can be used to get label json files by running the command below and replacing datapath and outpath parameters.
17
+ ```
18
+ python scripts/data_process.py --datapath /path/to/data/root --outpath /path/to/label/folder
19
+ ```
20
+
21
+ After generating label files, please modify the `dataset_dir` parameter specified in `configs/train.json` and `configs/inference.json` to reflect where label files are.
22
+
23
+ The input label json should be a list made up by dicts which includes `image` and `label` keys. An example format is shown below.
24
+
25
+ ```
26
+ [
27
+ {
28
+ "image":"/path/to/image/image_name0.jpg",
29
+ "label": 0
30
+ },
31
+ {
32
+ "image":"/path/to/image/image_name1.jpg",
33
+ "label": 0
34
+ },
35
+ {
36
+ "image":"/path/to/image/image_name2.jpg",
37
+ "label": 1
38
+ },
39
+ ....
40
+ {
41
+ "image":"/path/to/image/image_namek.jpg",
42
+ "label": 0
43
+ },
44
+ ]
45
+ ```
46
+
47
+ ## Training configuration
48
+ The training was performed with an at least 12GB-memory GPU.
49
+
50
+ Actual Model Input: 256 x 256 x 3
51
+
52
+ ## Input and output formats
53
+ Input: 3 channel video frames
54
+
55
+ Output: probability vector whose length equals to 2: Label 0: in body; Label 1: out body
56
+
57
+ ## Scores
58
+ This model achieves the following accuracy score on the test dataset:
59
+
60
+ Accuracy = 0.98
61
+
62
+ ## commands example
63
+ Execute training:
64
+
65
+ ```
66
+ python -m monai.bundle run training \
67
+ --meta_file configs/metadata.json \
68
+ --config_file configs/train.json \
69
+ --logging_file configs/logging.conf
70
+ ```
71
+
72
+ Override the `train` config to execute multi-GPU training:
73
+
74
+ ```
75
+ torchrun --standalone --nnodes=1 --nproc_per_node=2 -m monai.bundle run training \
76
+ --meta_file configs/metadata.json \
77
+ --config_file "['configs/train.json','configs/multi_gpu_train.json']" \
78
+ --logging_file configs/logging.conf
79
+ ```
80
+
81
+ Please note that the distributed training related options depend on the actual running environment, thus you may need to remove `--standalone`, modify `--nnodes` or do some other necessary changes according to the machine you used.
82
+ Please refer to [pytorch's official tutorial](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html) for more details.
83
+
84
+ Override the `train` config to execute evaluation with the trained model:
85
+
86
+ ```
87
+ python -m monai.bundle run evaluating \
88
+ --meta_file configs/metadata.json \
89
+ --config_file "['configs/train.json','configs/evaluate.json']" \
90
+ --logging_file configs/logging.conf
91
+ ```
92
+
93
+ Execute inference:
94
+
95
+ ```
96
+ python -m monai.bundle run evaluating \
97
+ --meta_file configs/metadata.json \
98
+ --config_file configs/inference.json \
99
+ --logging_file configs/logging.conf
100
+ ```
101
+
102
+ Export checkpoint to TorchScript file:
103
+
104
+ ```
105
+ python -m monai.bundle ckpt_export network_def \
106
+ --filepath models/model.ts \
107
+ --ckpt_file models/model.pt \
108
+ --meta_file configs/metadata.json \
109
+ --config_file configs/inference.json
110
+ ```
111
+
112
+ Export checkpoint to onnx file, which has been tested on pytorch 1.12.0:
113
+
114
+ ```
115
+ python scripts/export_to_onnx.py --model models/model.pt --outpath models/model.onnx
116
+ ```
117
+
118
+ Export TensorRT float16 model from the onnx model:
119
+
120
+ ```
121
+ trtexec --onnx=models/model.onnx --saveEngine=models/model.trt --fp16 \
122
+ --minShapes=INPUT__0:1x3x256x256 \
123
+ --optShapes=INPUT__0:16x3x256x256 \
124
+ --maxShapes=INPUT__0:32x3x256x256 \
125
+ --shapes=INPUT__0:8x3x256x256
126
+ ```
127
+ This command need TensorRT with correct CUDA installed in the environment. For the detail of installing TensorRT, please refer to [this link](https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html).
128
+
129
+ # References
130
+ [1] J. Hu, L. Shen and G. Sun, Squeeze-and-Excitation Networks, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 7132-7141. https://arxiv.org/pdf/1709.01507.pdf
131
+
132
+ # License
133
+ Copyright (c) MONAI Consortium
134
+
135
+ Licensed under the Apache License, Version 2.0 (the "License");
136
+ you may not use this file except in compliance with the License.
137
+ You may obtain a copy of the License at
138
+
139
+ http://www.apache.org/licenses/LICENSE-2.0
140
+
141
+ Unless required by applicable law or agreed to in writing, software
142
+ distributed under the License is distributed on an "AS IS" BASIS,
143
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
144
+ See the License for the specific language governing permissions and
145
+ limitations under the License.
configs/metadata.json CHANGED
@@ -1,7 +1,8 @@
1
  {
2
  "schema": "https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/meta_schema_20220324.json",
3
- "version": "0.2.2",
4
  "changelog": {
 
5
  "0.2.2": "update to use monai 1.0.1",
6
  "0.2.1": "enhance readme on commands example",
7
  "0.2.0": "update license files",
 
1
  {
2
  "schema": "https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/meta_schema_20220324.json",
3
+ "version": "0.3.0",
4
  "changelog": {
5
+ "0.3.0": "update dataset processing",
6
  "0.2.2": "update to use monai 1.0.1",
7
  "0.2.1": "enhance readme on commands example",
8
  "0.2.0": "update license files",
docs/README.md ADDED
@@ -0,0 +1,138 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Description
2
+ A pre-trained model for the endoscopic inbody classification task.
3
+
4
+ # Model Overview
5
+ This model is trained using the SEResNet50 structure, whose details can be found in [1]. All datasets are from private samples of [Activ Surgical](https://www.activsurgical.com/). Samples in training and validation dataset are from the same 4 videos, while test samples are from different two videos.
6
+ The [pytorch model](https://drive.google.com/file/d/14CS-s1uv2q6WedYQGeFbZeEWIkoyNa-x/view?usp=sharing) and [torchscript model](https://drive.google.com/file/d/1fOoJ4n5DWKHrt9QXTZ2sXwr9C-YvVGCM/view?usp=sharing) are shared in google drive. Modify the `bundle_root` parameter specified in `configs/train.json` and `configs/inference.json` to reflect where models are downloaded. Expected directory path to place downloaded models is `models/` under `bundle_root`.
7
+
8
+ ## Data
9
+ Datasets used in this work were provided by [Activ Surgical](https://www.activsurgical.com/). Here is a [link](https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/inbody_outbody_samples.zip) of 20 samples (10 in-body and 10 out-body) to show what this dataset looks like. After downloading this dataset, python script in `scripts` folder naming `data_process` can be used to get label json files by running the command below and replacing datapath and outpath parameters.
10
+ ```
11
+ python scripts/data_process.py --datapath /path/to/data/root --outpath /path/to/label/folder
12
+ ```
13
+
14
+ After generating label files, please modify the `dataset_dir` parameter specified in `configs/train.json` and `configs/inference.json` to reflect where label files are.
15
+
16
+ The input label json should be a list made up by dicts which includes `image` and `label` keys. An example format is shown below.
17
+
18
+ ```
19
+ [
20
+ {
21
+ "image":"/path/to/image/image_name0.jpg",
22
+ "label": 0
23
+ },
24
+ {
25
+ "image":"/path/to/image/image_name1.jpg",
26
+ "label": 0
27
+ },
28
+ {
29
+ "image":"/path/to/image/image_name2.jpg",
30
+ "label": 1
31
+ },
32
+ ....
33
+ {
34
+ "image":"/path/to/image/image_namek.jpg",
35
+ "label": 0
36
+ },
37
+ ]
38
+ ```
39
+
40
+ ## Training configuration
41
+ The training was performed with an at least 12GB-memory GPU.
42
+
43
+ Actual Model Input: 256 x 256 x 3
44
+
45
+ ## Input and output formats
46
+ Input: 3 channel video frames
47
+
48
+ Output: probability vector whose length equals to 2: Label 0: in body; Label 1: out body
49
+
50
+ ## Scores
51
+ This model achieves the following accuracy score on the test dataset:
52
+
53
+ Accuracy = 0.98
54
+
55
+ ## commands example
56
+ Execute training:
57
+
58
+ ```
59
+ python -m monai.bundle run training \
60
+ --meta_file configs/metadata.json \
61
+ --config_file configs/train.json \
62
+ --logging_file configs/logging.conf
63
+ ```
64
+
65
+ Override the `train` config to execute multi-GPU training:
66
+
67
+ ```
68
+ torchrun --standalone --nnodes=1 --nproc_per_node=2 -m monai.bundle run training \
69
+ --meta_file configs/metadata.json \
70
+ --config_file "['configs/train.json','configs/multi_gpu_train.json']" \
71
+ --logging_file configs/logging.conf
72
+ ```
73
+
74
+ Please note that the distributed training related options depend on the actual running environment, thus you may need to remove `--standalone`, modify `--nnodes` or do some other necessary changes according to the machine you used.
75
+ Please refer to [pytorch's official tutorial](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html) for more details.
76
+
77
+ Override the `train` config to execute evaluation with the trained model:
78
+
79
+ ```
80
+ python -m monai.bundle run evaluating \
81
+ --meta_file configs/metadata.json \
82
+ --config_file "['configs/train.json','configs/evaluate.json']" \
83
+ --logging_file configs/logging.conf
84
+ ```
85
+
86
+ Execute inference:
87
+
88
+ ```
89
+ python -m monai.bundle run evaluating \
90
+ --meta_file configs/metadata.json \
91
+ --config_file configs/inference.json \
92
+ --logging_file configs/logging.conf
93
+ ```
94
+
95
+ Export checkpoint to TorchScript file:
96
+
97
+ ```
98
+ python -m monai.bundle ckpt_export network_def \
99
+ --filepath models/model.ts \
100
+ --ckpt_file models/model.pt \
101
+ --meta_file configs/metadata.json \
102
+ --config_file configs/inference.json
103
+ ```
104
+
105
+ Export checkpoint to onnx file, which has been tested on pytorch 1.12.0:
106
+
107
+ ```
108
+ python scripts/export_to_onnx.py --model models/model.pt --outpath models/model.onnx
109
+ ```
110
+
111
+ Export TensorRT float16 model from the onnx model:
112
+
113
+ ```
114
+ trtexec --onnx=models/model.onnx --saveEngine=models/model.trt --fp16 \
115
+ --minShapes=INPUT__0:1x3x256x256 \
116
+ --optShapes=INPUT__0:16x3x256x256 \
117
+ --maxShapes=INPUT__0:32x3x256x256 \
118
+ --shapes=INPUT__0:8x3x256x256
119
+ ```
120
+ This command need TensorRT with correct CUDA installed in the environment. For the detail of installing TensorRT, please refer to [this link](https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html).
121
+
122
+ # References
123
+ [1] J. Hu, L. Shen and G. Sun, Squeeze-and-Excitation Networks, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 7132-7141. https://arxiv.org/pdf/1709.01507.pdf
124
+
125
+ # License
126
+ Copyright (c) MONAI Consortium
127
+
128
+ Licensed under the Apache License, Version 2.0 (the "License");
129
+ you may not use this file except in compliance with the License.
130
+ You may obtain a copy of the License at
131
+
132
+ http://www.apache.org/licenses/LICENSE-2.0
133
+
134
+ Unless required by applicable law or agreed to in writing, software
135
+ distributed under the License is distributed on an "AS IS" BASIS,
136
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
137
+ See the License for the specific language governing permissions and
138
+ limitations under the License.
scripts/data_process.py ADDED
@@ -0,0 +1,74 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import json
3
+ import os
4
+
5
+ train_rate = 0.6
6
+ val_rate = 0.2
7
+ test_rate = 0.2
8
+
9
+
10
+ def save_json(content, path, filename):
11
+ if not os.path.exists(path):
12
+ os.makedirs(path, exist_ok=True)
13
+ dst_file_name = os.path.join(path, filename)
14
+ with open(dst_file_name, "w+") as fp:
15
+ json.dump(content, fp, indent=4, separators=(",", ":"))
16
+
17
+
18
+ def generate_labels(data_path, output_path):
19
+ """
20
+ Loading a model by name.
21
+
22
+ Args:
23
+ data_path: path to classification dataset, which must contain `inbody` and `outbody` directories.
24
+ output_path: path to save labels
25
+ """
26
+
27
+ data_list = [os.path.join(root, x) for root, _, filenames in os.walk(data_path) for x in filenames if "jpg" in x]
28
+ label_list = [int("outbody" in os.path.basename(os.path.dirname(x))) for x in data_list]
29
+ data_label_json = [{"image": x, "label": y} for x, y in zip(data_list, label_list)]
30
+ inbody_list = list(filter(lambda x: x["label"] == 0, data_label_json))
31
+ outbody_list = list(filter(lambda x: not (x["label"] == 0), data_label_json))
32
+ inbody_train_len = int(len(inbody_list) * train_rate)
33
+ outbody_train_len = int(len(outbody_list) * train_rate)
34
+ inbody_val_len = int(len(inbody_list) * (train_rate + val_rate))
35
+ outbody_val_len = int(len(outbody_list) * (train_rate + val_rate))
36
+ inbody_train_list = inbody_list[:inbody_train_len]
37
+ outbody_train_list = outbody_list[:outbody_train_len]
38
+ inbody_val_list = inbody_list[inbody_train_len:inbody_val_len]
39
+ outbody_val_list = outbody_list[outbody_train_len:outbody_val_len]
40
+ inbody_test_list = inbody_list[inbody_val_len:]
41
+ outbody_test_list = outbody_list[outbody_val_len:]
42
+ train_list = inbody_train_list + outbody_train_list
43
+ val_list = inbody_val_list + outbody_val_list
44
+ test_list = inbody_test_list + outbody_test_list
45
+ save_json(train_list, out_path, "train.json")
46
+ save_json(val_list, out_path, "val.json")
47
+ save_json(test_list, out_path, "test.json")
48
+
49
+
50
+ if __name__ == "__main__":
51
+ parser = argparse.ArgumentParser()
52
+ # path to downloaded dataset.
53
+ parser.add_argument(
54
+ "--datapath",
55
+ type=str,
56
+ default=r"/workspace/data/endoscopic_inbody_classification",
57
+ help="Input an existing model weight",
58
+ )
59
+
60
+ # path to save label json.
61
+ parser.add_argument(
62
+ "--outpath",
63
+ type=str,
64
+ default=r"/workspace/data/endoscopic_inbody_classification",
65
+ help="A path to save the onnx model.",
66
+ )
67
+
68
+ args = parser.parse_args()
69
+ data_path = args.datapath
70
+ out_path = args.outpath
71
+
72
+ if not os.path.exists(out_path):
73
+ os.makedirs(out_path, exist_ok=True)
74
+ generate_labels(data_path, out_path)