葛政(实习) commited on
Commit
d9f51c5
·
1 Parent(s): 65998a0

feat(demo): add OpenVINO and ONNXRuntime demo

Browse files
README.md CHANGED
@@ -1,2 +1,128 @@
1
- # YOLOX
2
- Higher performance and anchor-free YOLO detector. Code of train/test/deploy included.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div align="center"><img src="assets/logo.png" width="600"></div>
2
+
3
+ <img src="assets/demo.png" >
4
+
5
+ ## <div align="center">Introduction</div>
6
+ YOLOX is an anchor-free version of YOLO, with a simpler design but better performance! It aims to bridge the gap between research and industrial communities.
7
+
8
+
9
+ ## <div align="center">Why YOLOX?</div>
10
+
11
+ <div align="center"><img src="assets/fig1.png" width="400" ><img src="assets/fig2.png" width="400"></div>
12
+
13
+ ## <div align="center">News!!</div>
14
+ * 【2020/07/19】 We have released our technical report on [Arxiv](xxx)!!
15
+
16
+ ## <div align="center">Benchmark</div>
17
+
18
+ ### Standard Models.
19
+ |Model |size |mAP<sup>test<br>0.5:0.95 | Speed V100<br>(ms) | Params<br>(M) |FLOPs<br>(B)| weights |
20
+ | ------ |:---: | :---: |:---: |:---: | :---: | :----: |
21
+ |[YOLOX-s]() |640 |39.6 |9.8 |9.0 | 26.8 | - |
22
+ |[YOLOX-m]() |640 |46.4 |12.3 |25.3 |73.8| - |
23
+ |[YOLOX-l]() |640 |50.0 |14.5 |54.2| 155.6 | - |
24
+ |[YOLOX-x]() |640 |**51.2** | 17.3 |99.1 |281.9 | - |
25
+
26
+ ### Light Models.
27
+ |Model |size |mAP<sup>val<br>0.5:0.95 | Speed V100<br>(ms) | Params<br>(M) |FLOPs<br>(B)| weights |
28
+ | ------ |:---: | :---: |:---: |:---: | :---: | :----: |
29
+ |[YOLOX-Nano]() |416 |25.3 |- | 0.91 |1.08 | - |
30
+ |[YOLOX-Tiny]() |416 |31.7 |- | 5.06 |6.45 | - |
31
+
32
+ ## <div align="center">Quick Start</div>
33
+
34
+ ### Installation
35
+
36
+ Step1. Install [apex](https://github.com/NVIDIA/apex).
37
+
38
+ ```shell
39
+ git clone https://github.com/NVIDIA/apex
40
+ cd apex
41
+ pip3 install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
42
+ ```
43
+ Step2. Install YOLOX.
44
+ ```bash
45
+ $ git clone [email protected]:Megvii-BaseDetection/YOLOX.git
46
+ $ cd yolox
47
+ $ pip3 install -v -e . # or "python3 setup.py develop
48
+ ```
49
+
50
+ ### Demo
51
+
52
+ You can use either -n or -f to specify your detector's config:
53
+
54
+ ```shell
55
+ python tools/demo.py -n yolox-s -c <MODEL_PATH> --conf 0.3 --nms 0.65 --tsize 640
56
+ ```
57
+ or
58
+ ```shell
59
+ python tools/demo.py -f exps/base/yolox_s.py -c <MODEL_PATH> --conf 0.3 --nms 0.65 --tsize 640
60
+ ```
61
+
62
+
63
+ <details open>
64
+ <summary>Reproduce our results on COCO</summary>
65
+
66
+ Step1.
67
+
68
+ * Reproduce our results on COCO by specifying -n:
69
+
70
+ ```shell
71
+ python tools/train.py -n yolox-s -d 8 -b 64 --fp16 -o
72
+ yolox-m
73
+ yolox-l
74
+ yolox-x
75
+ ```
76
+ Notes:
77
+ * -d: number of gpu devices
78
+ * -b: total batch size, the recommended number for -b equals to num_gpu * 8
79
+ * --fp16: mixed precision training
80
+
81
+ The above commands are equivalent to:
82
+
83
+ ```shell
84
+ python tools/train.py -f exps/base/yolox-s.py -d 8 -b 64 --fp16 -o
85
+ exps/base/yolox-m.py
86
+ exps/base/yolox-l.py
87
+ exps/base/yolox-x.py
88
+ ```
89
+
90
+ * Customize your training.
91
+
92
+ * Finetune your datset on COCO pretrained models.
93
+ </details>
94
+
95
+ <details open>
96
+ <summary>Evaluation</summary>
97
+ We support batch testing for fast evaluation:
98
+
99
+ ```shell
100
+ python tools/eval.py -n yolox-s -b 64 --conf 0.001 --fp16 (optional) --fuse (optional) --test (for test-dev set)
101
+ yolox-m
102
+ yolox-l
103
+ yolox-x
104
+ ```
105
+
106
+ To reproduce speed test, we use the following command:
107
+ ```shell
108
+ python tools/eval.py -n yolox-s -b 1 -d 0 --conf 0.001 --fp16 --fuse --test (for test-dev set)
109
+ yolox-m
110
+ yolox-l
111
+ yolox-x
112
+ ```
113
+
114
+ ## <div align="center">Deployment</div>
115
+
116
+ </details>
117
+
118
+ 1. [ONNX: Including ONNX export and an ONNXRuntime demo.]()
119
+ 2. [TensorRT in both C++ and Python]()
120
+ 3. [NCNN in C++]()
121
+ 4. [OpenVINO in both C++ and Python]()
122
+
123
+ ## <div align="center">Cite Our Work</div>
124
+
125
+
126
+ If you find this project useful for you, please use the following BibTeX entry.
127
+
128
+ TODO
demo/ONNXRuntime/README.md ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## ONNXRuntime Demo in Python
2
+
3
+ This doc introduces how to convert you pytorch model into onnx, and how to run an onnxruntime demo to verify your convertion.
4
+
5
+ ### Download ONNX models.
6
+ | Model | Parameters | GFLOPs | Test Size | mAP |
7
+ |:------| :----: | :----: | :---: | :---: |
8
+ | [YOLOX-Nano](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.res101.fpn.coco.800size.1x) | 0.91M | 1.08 | 416x416 | 25.3 |
9
+ | [YOLOX-Tiny](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.fpn.coco.800size.1x) | 5.06M | 6.45 | 416x416 |31.7 |
10
+ | [YOLOX-S](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 9.0M | 26.8 | 640x640 |39.6 |
11
+ | [YOLOX-M](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 25.3M | 73.8 | 640x640 |46.4 |
12
+ | [YOLOX-L](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 54.2M | 155.6 | 640x640 |50.0 |
13
+ | [YOLOX-X](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 99.1M | 281.9 | 640x640 |51.2 |
14
+ | [YOLOX-Darknet53](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 63.72M | 185.3 | 640x640 |47.3 |
15
+
16
+ ### Convert Your Model to ONNX
17
+
18
+ First, you should move to <YOLOX_HOME> by:
19
+ ```shell
20
+ cd <YOLOX_HOME>
21
+ ```
22
+ Then, you can:
23
+
24
+ 1. Convert a standard YOLOX model by -n:
25
+ ```shell
26
+ python3 tools/export_onnx.py --output-name yolox_s.onnx -n yolox-s -c yolox_s.pth.tar
27
+ ```
28
+ Notes:
29
+ * -n: specify a model name. The model name must be one of the [yolox-s,m,l,x and yolox-nane, yolox-tiny, yolov3]
30
+ * -c: the model you have trained
31
+ * -o: opset version, default 11. **However, if you will further convert your onnx model to [OpenVINO](), please specify the opset version to 10.**
32
+ * --no-onnxsim: disable onnxsim
33
+ * To customize an input shape for onnx model, modify the following code in tools/export.py:
34
+
35
+ ```python
36
+ dummy_input = torch.randn(1, 3, exp.test_size[0], exp.test_size[1])
37
+ ```
38
+
39
+ 2. Convert a standard YOLOX model by -f. By using -f, the above command is equivalent to:
40
+
41
+ ```shell
42
+ python3 tools/export_onnx.py --output-name yolox_s.onnx -f exps/yolox_s.py -c yolox_s.pth.tar
43
+ ```
44
+
45
+ 3. To convert your customized model, please use -f:
46
+
47
+ ```shell
48
+ python3 tools/export_onnx.py --output-name your_yolox.onnx -f exps/your_yolox.py -c your_yolox.pth.tar
49
+ ```
50
+
51
+ ### ONNXRuntime Demo
52
+
53
+ Step1.
54
+ ```shell
55
+ cd <YOLOX_HOME>/yolox/deploy/demo_onnxruntime/
56
+ ```
57
+
58
+ Step2.
59
+ ```shell
60
+ python3 onnx_inference.py -m <ONNX_MODEL_PATH> -i <IMAGE_PATH> -o <OUTPUT_DIR> -s 0.3 --input_shape 640,640
61
+ ```
62
+ Notes:
63
+ * -m: your converted onnx model
64
+ * -i: input_image
65
+ * -s: score threshold for visualization.
66
+ * --input_shape: should be consistent with the shape you used for onnx convertion.
demo/ONNXRuntime/demo_utils.py ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+
3
+ import os
4
+
5
+
6
+ def mkdir(path):
7
+ if not os.path.exists(path):
8
+ os.makedirs(path)
9
+
10
+
11
+ def nms(boxes, scores, nms_thr):
12
+ """Single class NMS implemented in Numpy."""
13
+ x1 = boxes[:, 0]
14
+ y1 = boxes[:, 1]
15
+ x2 = boxes[:, 2]
16
+ y2 = boxes[:, 3]
17
+
18
+ areas = (x2 - x1 + 1) * (y2 - y1 + 1)
19
+ order = scores.argsort()[::-1]
20
+
21
+ keep = []
22
+ while order.size > 0:
23
+ i = order[0]
24
+ keep.append(i)
25
+ xx1 = np.maximum(x1[i], x1[order[1:]])
26
+ yy1 = np.maximum(y1[i], y1[order[1:]])
27
+ xx2 = np.minimum(x2[i], x2[order[1:]])
28
+ yy2 = np.minimum(y2[i], y2[order[1:]])
29
+
30
+ w = np.maximum(0.0, xx2 - xx1 + 1)
31
+ h = np.maximum(0.0, yy2 - yy1 + 1)
32
+ inter = w * h
33
+ ovr = inter / (areas[i] + areas[order[1:]] - inter)
34
+
35
+ inds = np.where(ovr <= nms_thr)[0]
36
+ order = order[inds + 1]
37
+
38
+ return keep
39
+
40
+
41
+ def multiclass_nms(boxes, scores, nms_thr, score_thr):
42
+ """Multiclass NMS implemented in Numpy"""
43
+ final_dets = []
44
+ num_classes = scores.shape[1]
45
+ for cls_ind in range(num_classes):
46
+ cls_scores = scores[:, cls_ind]
47
+ valid_score_mask = cls_scores > score_thr
48
+ if valid_score_mask.sum() == 0:
49
+ continue
50
+ else:
51
+ valid_scores = cls_scores[valid_score_mask]
52
+ valid_boxes = boxes[valid_score_mask]
53
+ keep = nms(valid_boxes, valid_scores, nms_thr)
54
+ if len(keep) > 0:
55
+ cls_inds = np.ones((len(keep), 1)) * cls_ind
56
+ dets = np.concatenate([valid_boxes[keep], valid_scores[keep, None], cls_inds], 1)
57
+ final_dets.append(dets)
58
+ return np.concatenate(final_dets, 0)
59
+
60
+
61
+ def postprocess(outputs, img_size, p6=False):
62
+
63
+ grids = []
64
+ expanded_strides = []
65
+
66
+ if not p6:
67
+ strides = [8, 16, 32]
68
+ else:
69
+ strides = [8, 16, 32, 64]
70
+
71
+ hsizes = [img_size[0]//stride for stride in strides]
72
+ wsizes = [img_size[1]//stride for stride in strides]
73
+
74
+ for hsize, wsize, stride in zip(hsizes, wsizes, strides):
75
+ xv, yv = np.meshgrid(np.arange(hsize), np.arange(wsize))
76
+ grid = np.stack((xv, yv), 2).reshape(1, -1, 2)
77
+ grids.append(grid)
78
+ shape = grid.shape[:2]
79
+ expanded_strides.append(np.full((*shape, 1), stride))
80
+
81
+ grids = np.concatenate(grids, 1)
82
+ expanded_strides = np.concatenate(expanded_strides, 1)
83
+ outputs[..., :2] = (outputs[..., :2] + grids) * expanded_strides
84
+ outputs[..., 2:4] = np.exp(outputs[..., 2:4]) * expanded_strides
85
+
86
+ return outputs
demo/ONNXRuntime/onnx_inference.py ADDED
@@ -0,0 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import cv2
2
+ import numpy as np
3
+
4
+ from yolox.data.data_augment import preproc as preprocess
5
+ from yolox.data.datasets import COCO_CLASSES
6
+ from yolox.utils.visualize import vis
7
+
8
+ import argparse
9
+ import onnxruntime
10
+ import os
11
+ from demo_utils import mkdir, multiclass_nms, postprocess
12
+
13
+
14
+ def make_parser():
15
+ parser = argparse.ArgumentParser("onnxruntime inference sample")
16
+ parser.add_argument(
17
+ "-m",
18
+ "--model",
19
+ type=str,
20
+ default="yolox.onnx",
21
+ help="Input your onnx model.",
22
+ )
23
+ parser.add_argument(
24
+ "-i",
25
+ "--image_path",
26
+ type=str,
27
+ default='test_image.png',
28
+ help="Path to your input image.",
29
+ )
30
+ parser.add_argument(
31
+ "-o",
32
+ "--output_dir",
33
+ type=str,
34
+ default='demo_output',
35
+ help="Path to your output directory.",
36
+ )
37
+ parser.add_argument(
38
+ "-s",
39
+ "--score_thr",
40
+ type=float,
41
+ default=0.3,
42
+ help="Score threshould to filter the result.",
43
+ )
44
+ parser.add_argument(
45
+ "--input_shape",
46
+ type=str,
47
+ default="640,640",
48
+ help="Specify an input shape for inference.",
49
+ )
50
+ parser.add_argument(
51
+ "--with_p6",
52
+ action="store_true",
53
+ help="Whether your model uses p6 in FPN/PAN.",
54
+ )
55
+ return parser
56
+
57
+
58
+ if __name__ == '__main__':
59
+ args = make_parser().parse_args()
60
+
61
+ input_shape = tuple(map(int, args.input_shape.split(',')))
62
+ origin_img = cv2.imread(args.image_path)
63
+ mean = (0.485, 0.456, 0.406)
64
+ std = (0.229, 0.224, 0.225)
65
+ img, ratio = preprocess(origin_img, input_shape, mean, std)
66
+
67
+ session = onnxruntime.InferenceSession(args.model)
68
+
69
+ ort_inputs = {session.get_inputs()[0].name: img[None, :, :, :]}
70
+ output = session.run(None, ort_inputs)
71
+ predictions = postprocess(output[0], input_shape, p6=args.with_p6)[0]
72
+
73
+ boxes = predictions[:, :4]
74
+ scores = predictions[:, 4:5] * predictions[:, 5:]
75
+
76
+ boxes_xyxy = np.ones_like(boxes)
77
+ boxes_xyxy[:, 0] = boxes[:, 0] - boxes[:, 2]/2.
78
+ boxes_xyxy[:, 1] = boxes[:, 1] - boxes[:, 3]/2.
79
+ boxes_xyxy[:, 2] = boxes[:, 0] + boxes[:, 2]/2.
80
+ boxes_xyxy[:, 3] = boxes[:, 1] + boxes[:, 3]/2.
81
+ boxes_xyxy /= ratio
82
+ dets = multiclass_nms(boxes_xyxy, scores, nms_thr=0.65, score_thr=0.1)
83
+
84
+ final_boxes, final_scores, final_cls_inds = dets[:, :4], dets[:, 4], dets[:, 5]
85
+ origin_img = vis(origin_img, final_boxes, final_scores, final_cls_inds,
86
+ conf=args.score_thr, class_names=COCO_CLASSES)
87
+
88
+ mkdir(args.output_dir)
89
+ output_path = os.path.join(args.output_dir, args.image_path.split("/")[-1])
90
+ cv2.imwrite(output_path, origin_img)
demo/OpenVINO/README.md ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ ## YOLOX on OpenVINO
2
+
3
+ * [C++ Demo]()
4
+ * [Python Demo]()
demo/OpenVINO/cpp/CMakeLists.txt ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ cmake_minimum_required(VERSION 3.4.1)
2
+ set(CMAKE_CXX_STANDARD 14)
3
+
4
+ project(yolox_openvino_demo)
5
+
6
+ find_package(OpenCV REQUIRED)
7
+ find_package(InferenceEngine REQUIRED)
8
+ find_package(ngraph REQUIRED)
9
+
10
+ include_directories(
11
+ ${OpenCV_INCLUDE_DIRS}
12
+ ${CMAKE_CURRENT_SOURCE_DIR}
13
+ ${CMAKE_CURRENT_BINARY_DIR}
14
+ )
15
+
16
+ add_executable(yolox_openvino yolox_openvino.cpp)
17
+
18
+ target_link_libraries(
19
+ yolox_openvino
20
+ ${InferenceEngine_LIBRARIES}
21
+ ${NGRAPH_LIBRARIES}
22
+ ${OpenCV_LIBS}
23
+ )
demo/OpenVINO/cpp/README.md ADDED
@@ -0,0 +1,94 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # User Guide for Deploy YOLOX on OpenVINO
2
+
3
+ This toturial includes a C++ demo for OpenVINO, as well as some converted models.
4
+
5
+ ### Download OpenVINO models.
6
+ | Model | Parameters | GFLOPs | Test Size | mAP |
7
+ |:------| :----: | :----: | :---: | :---: |
8
+ | [YOLOX-Nano](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.res101.fpn.coco.800size.1x) | 0.91M | 1.08 | 416x416 | 25.3 |
9
+ | [YOLOX-Tiny](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.fpn.coco.800size.1x) | 5.06M | 6.45 | 416x416 |31.7 |
10
+ | [YOLOX-S](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 9.0M | 26.8 | 640x640 |39.6 |
11
+ | [YOLOX-M](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 25.3M | 73.8 | 640x640 |46.4 |
12
+ | [YOLOX-L](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 54.2M | 155.6 | 640x640 |50.0 |
13
+ | [YOLOX-X](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 99.1M | 281.9 | 640x640 |51.2 |
14
+ | [YOLOX-Darknet53](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 63.72M | 185.3 | 640x640 |47.3 |
15
+
16
+ ## Install OpenVINO Toolkit
17
+
18
+ Please visit [Openvino Homepage](https://docs.openvinotoolkit.org/latest/get_started_guides.html) for more details.
19
+
20
+ ## Set up the Environment
21
+
22
+ ### For Linux
23
+
24
+ **Option1. Set up the environment tempororally. You need to run this command everytime you start a new shell window.**
25
+
26
+ ```shell
27
+ source /opt/intel/openvino_2021/bin/setupvars.sh
28
+ ```
29
+
30
+ **Option2. Set up the environment permenantly.**
31
+
32
+ *Step1.* For Linux:
33
+ ```shell
34
+ vim ~/.bashrc
35
+ ```
36
+
37
+ *Step2.* Add the following line into your file:
38
+
39
+ ```shell
40
+ source /opt/intel/openvino_2021/bin/setupvars.sh
41
+ ```
42
+
43
+ *Step3.* Save and exit the file, then run:
44
+
45
+ ```shell
46
+ source ~/.bashrc
47
+ ```
48
+
49
+
50
+ ## Convert model
51
+
52
+ 1. Export ONNX model
53
+
54
+ Please refer to the [ONNX toturial]() for more details. **Note that you should set --opset to 10, otherwise your next step will fail.**
55
+
56
+ 2. Convert ONNX to OpenVINO
57
+
58
+ ``` shell
59
+ cd <INSTSLL_DIR>/openvino_2021/deployment_tools/model_optimizer
60
+ ```
61
+
62
+ Install requirements for convert tool
63
+
64
+ ```shell
65
+ sudo ./install_prerequisites/install_prerequisites_onnx.sh
66
+ ```
67
+
68
+ Then convert model.
69
+ ```shell
70
+ python3 mo.py --input_model <ONNX_MODEL> --input_shape <INPUT_SHAPE> [--data_type FP16]
71
+ ```
72
+ For example:
73
+ ```shell
74
+ python3 mo.py --input_model yolox.onnx --input_shape (1,3,640,640) --data_type FP16
75
+ ```
76
+
77
+ ## Build
78
+
79
+ ### Linux
80
+ ```shell
81
+ source /opt/intel/openvino_2021/bin/setupvars.sh
82
+ mkdir build
83
+ cd build
84
+ cmake ..
85
+ make
86
+ ```
87
+
88
+ ## Demo
89
+
90
+ ### c++
91
+
92
+ ```shell
93
+ ./yolox_openvino <XML_MODEL_PATH> <IMAGE_PATH> <DEVICE>
94
+ ```
demo/OpenVINO/cpp/yolox_openvino.cpp ADDED
@@ -0,0 +1,531 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ // Copyright (C) 2018-2021 Intel Corporation
2
+ // SPDX-License-Identifier: Apache-2.0
3
+ //
4
+
5
+ #include <iterator>
6
+ #include <memory>
7
+ #include <string>
8
+ #include <vector>
9
+ #include <opencv2/opencv.hpp>
10
+ #include <iostream>
11
+ #include <inference_engine.hpp>
12
+
13
+ using namespace InferenceEngine;
14
+
15
+ /**
16
+ * @brief Define names based depends on Unicode path support
17
+ */
18
+ #define tcout std::cout
19
+ #define file_name_t std::string
20
+ #define imread_t cv::imread
21
+ #define NMS_THRESH 0.65
22
+ #define BBOX_CONF_THRESH 0.3
23
+
24
+ static const int INPUT_W = 416;
25
+ static const int INPUT_H = 416;
26
+
27
+ cv::Mat static_resize(cv::Mat& img) {
28
+ float r = std::min(INPUT_W / (img.cols*1.0), INPUT_H / (img.rows*1.0));
29
+ // r = std::min(r, 1.0f);
30
+ int unpad_w = r * img.cols;
31
+ int unpad_h = r * img.rows;
32
+ cv::Mat re(unpad_h, unpad_w, CV_8UC3);
33
+ cv::resize(img, re, re.size());
34
+ cv::Mat out(INPUT_W, INPUT_H, CV_8UC3, cv::Scalar(114, 114, 114));
35
+ re.copyTo(out(cv::Rect(0, 0, re.cols, re.rows)));
36
+ return out;
37
+ }
38
+
39
+ void blobFromImage(cv::Mat& img, Blob::Ptr& blob){
40
+ cv::cvtColor(img, img, cv::COLOR_BGR2RGB);
41
+ int channels = 3;
42
+ int img_h = img.rows;
43
+ int img_w = img.cols;
44
+ std::vector<float> mean = {0.485, 0.456, 0.406};
45
+ std::vector<float> std = {0.229, 0.224, 0.225};
46
+ InferenceEngine::MemoryBlob::Ptr mblob = InferenceEngine::as<InferenceEngine::MemoryBlob>(blob);
47
+ if (!mblob)
48
+ {
49
+ THROW_IE_EXCEPTION << "We expect blob to be inherited from MemoryBlob in matU8ToBlob, "
50
+ << "but by fact we were not able to cast inputBlob to MemoryBlob";
51
+ }
52
+ // locked memory holder should be alive all time while access to its buffer happens
53
+ auto mblobHolder = mblob->wmap();
54
+
55
+ float *blob_data = mblobHolder.as<float *>();
56
+
57
+ for (size_t c = 0; c < channels; c++)
58
+ {
59
+ for (size_t h = 0; h < img_h; h++)
60
+ {
61
+ for (size_t w = 0; w < img_w; w++)
62
+ {
63
+ blob_data[c * img_w * img_h + h * img_w + w] =
64
+ (((float)img.at<cv::Vec3b>(h, w)[c]) / 255.0f - mean[c]) / std[c];
65
+ }
66
+ }
67
+ }
68
+ }
69
+
70
+
71
+ struct Object
72
+ {
73
+ cv::Rect_<float> rect;
74
+ int label;
75
+ float prob;
76
+ };
77
+
78
+ struct GridAndStride
79
+ {
80
+ int grid0;
81
+ int grid1;
82
+ int stride;
83
+ };
84
+
85
+ static int generate_grids_and_stride(const int target_size, std::vector<int>& strides, std::vector<GridAndStride>& grid_strides)
86
+ {
87
+ for (auto stride : strides)
88
+ {
89
+ int num_grid = target_size / stride;
90
+ for (int g1 = 0; g1 < num_grid; g1++)
91
+ {
92
+ for (int g0 = 0; g0 < num_grid; g0++)
93
+ {
94
+ grid_strides.push_back((GridAndStride){g0, g1, stride});
95
+ }
96
+ }
97
+ }
98
+ }
99
+
100
+
101
+ static void generate_yolox_proposals(std::vector<GridAndStride> grid_strides, const float* feat_ptr, float prob_threshold, std::vector<Object>& objects)
102
+ {
103
+ const int num_class = 80; // COCO has 80 classes. Modify this value on your own dataset.
104
+
105
+ const int num_anchors = grid_strides.size();
106
+
107
+ for (int anchor_idx = 0; anchor_idx < num_anchors; anchor_idx++)
108
+ {
109
+ const int grid0 = grid_strides[anchor_idx].grid0;
110
+ const int grid1 = grid_strides[anchor_idx].grid1;
111
+ const int stride = grid_strides[anchor_idx].stride;
112
+
113
+ const int basic_pos = anchor_idx * 85;
114
+
115
+ // yolox/models/yolo_head.py decode logic
116
+ // outputs[..., :2] = (outputs[..., :2] + grids) * strides
117
+ // outputs[..., 2:4] = torch.exp(outputs[..., 2:4]) * strides
118
+ float x_center = (feat_ptr[basic_pos + 0] + grid0) * stride;
119
+ float y_center = (feat_ptr[basic_pos + 1] + grid1) * stride;
120
+ float w = exp(feat_ptr[basic_pos + 2]) * stride;
121
+ float h = exp(feat_ptr[basic_pos + 3]) * stride;
122
+ float x0 = x_center - w * 0.5f;
123
+ float y0 = y_center - h * 0.5f;
124
+
125
+ float box_objectness = feat_ptr[basic_pos + 4];
126
+ for (int class_idx = 0; class_idx < num_class; class_idx++)
127
+ {
128
+ float box_cls_score = feat_ptr[basic_pos + 5 + class_idx];
129
+ float box_prob = box_objectness * box_cls_score;
130
+ if (box_prob > prob_threshold)
131
+ {
132
+ Object obj;
133
+ obj.rect.x = x0;
134
+ obj.rect.y = y0;
135
+ obj.rect.width = w;
136
+ obj.rect.height = h;
137
+ obj.label = class_idx;
138
+ obj.prob = box_prob;
139
+
140
+ objects.push_back(obj);
141
+ }
142
+
143
+ } // class loop
144
+
145
+ } // point anchor loop
146
+ }
147
+
148
+ static inline float intersection_area(const Object& a, const Object& b)
149
+ {
150
+ cv::Rect_<float> inter = a.rect & b.rect;
151
+ return inter.area();
152
+ }
153
+
154
+ static void qsort_descent_inplace(std::vector<Object>& faceobjects, int left, int right)
155
+ {
156
+ int i = left;
157
+ int j = right;
158
+ float p = faceobjects[(left + right) / 2].prob;
159
+
160
+ while (i <= j)
161
+ {
162
+ while (faceobjects[i].prob > p)
163
+ i++;
164
+
165
+ while (faceobjects[j].prob < p)
166
+ j--;
167
+
168
+ if (i <= j)
169
+ {
170
+ // swap
171
+ std::swap(faceobjects[i], faceobjects[j]);
172
+
173
+ i++;
174
+ j--;
175
+ }
176
+ }
177
+
178
+ #pragma omp parallel sections
179
+ {
180
+ #pragma omp section
181
+ {
182
+ if (left < j) qsort_descent_inplace(faceobjects, left, j);
183
+ }
184
+ #pragma omp section
185
+ {
186
+ if (i < right) qsort_descent_inplace(faceobjects, i, right);
187
+ }
188
+ }
189
+ }
190
+
191
+
192
+ static void qsort_descent_inplace(std::vector<Object>& objects)
193
+ {
194
+ if (objects.empty())
195
+ return;
196
+
197
+ qsort_descent_inplace(objects, 0, objects.size() - 1);
198
+ }
199
+
200
+ static void nms_sorted_bboxes(const std::vector<Object>& faceobjects, std::vector<int>& picked, float nms_threshold)
201
+ {
202
+ picked.clear();
203
+
204
+ const int n = faceobjects.size();
205
+
206
+ std::vector<float> areas(n);
207
+ for (int i = 0; i < n; i++)
208
+ {
209
+ areas[i] = faceobjects[i].rect.area();
210
+ }
211
+
212
+ for (int i = 0; i < n; i++)
213
+ {
214
+ const Object& a = faceobjects[i];
215
+
216
+ int keep = 1;
217
+ for (int j = 0; j < (int)picked.size(); j++)
218
+ {
219
+ const Object& b = faceobjects[picked[j]];
220
+
221
+ // intersection over union
222
+ float inter_area = intersection_area(a, b);
223
+ float union_area = areas[i] + areas[picked[j]] - inter_area;
224
+ // float IoU = inter_area / union_area
225
+ if (inter_area / union_area > nms_threshold)
226
+ keep = 0;
227
+ }
228
+
229
+ if (keep)
230
+ picked.push_back(i);
231
+ }
232
+ }
233
+
234
+
235
+ static void decode_outputs(const float* prob, std::vector<Object>& objects, float scale, const int img_w, const int img_h) {
236
+ std::vector<Object> proposals;
237
+ std::vector<int> strides = {8, 16, 32};
238
+ std::vector<GridAndStride> grid_strides;
239
+
240
+ generate_grids_and_stride(INPUT_W, strides, grid_strides);
241
+ generate_yolox_proposals(grid_strides, prob, BBOX_CONF_THRESH, proposals);
242
+ qsort_descent_inplace(proposals);
243
+
244
+ std::vector<int> picked;
245
+ nms_sorted_bboxes(proposals, picked, NMS_THRESH);
246
+ int count = picked.size();
247
+ objects.resize(count);
248
+
249
+ for (int i = 0; i < count; i++)
250
+ {
251
+ objects[i] = proposals[picked[i]];
252
+
253
+ // adjust offset to original unpadded
254
+ float x0 = (objects[i].rect.x) / scale;
255
+ float y0 = (objects[i].rect.y) / scale;
256
+ float x1 = (objects[i].rect.x + objects[i].rect.width) / scale;
257
+ float y1 = (objects[i].rect.y + objects[i].rect.height) / scale;
258
+
259
+ // clip
260
+ x0 = std::max(std::min(x0, (float)(img_w - 1)), 0.f);
261
+ y0 = std::max(std::min(y0, (float)(img_h - 1)), 0.f);
262
+ x1 = std::max(std::min(x1, (float)(img_w - 1)), 0.f);
263
+ y1 = std::max(std::min(y1, (float)(img_h - 1)), 0.f);
264
+
265
+ objects[i].rect.x = x0;
266
+ objects[i].rect.y = y0;
267
+ objects[i].rect.width = x1 - x0;
268
+ objects[i].rect.height = y1 - y0;
269
+ }
270
+ }
271
+
272
+ const float color_list[80][3] =
273
+ {
274
+ {0.000, 0.447, 0.741},
275
+ {0.850, 0.325, 0.098},
276
+ {0.929, 0.694, 0.125},
277
+ {0.494, 0.184, 0.556},
278
+ {0.466, 0.674, 0.188},
279
+ {0.301, 0.745, 0.933},
280
+ {0.635, 0.078, 0.184},
281
+ {0.300, 0.300, 0.300},
282
+ {0.600, 0.600, 0.600},
283
+ {1.000, 0.000, 0.000},
284
+ {1.000, 0.500, 0.000},
285
+ {0.749, 0.749, 0.000},
286
+ {0.000, 1.000, 0.000},
287
+ {0.000, 0.000, 1.000},
288
+ {0.667, 0.000, 1.000},
289
+ {0.333, 0.333, 0.000},
290
+ {0.333, 0.667, 0.000},
291
+ {0.333, 1.000, 0.000},
292
+ {0.667, 0.333, 0.000},
293
+ {0.667, 0.667, 0.000},
294
+ {0.667, 1.000, 0.000},
295
+ {1.000, 0.333, 0.000},
296
+ {1.000, 0.667, 0.000},
297
+ {1.000, 1.000, 0.000},
298
+ {0.000, 0.333, 0.500},
299
+ {0.000, 0.667, 0.500},
300
+ {0.000, 1.000, 0.500},
301
+ {0.333, 0.000, 0.500},
302
+ {0.333, 0.333, 0.500},
303
+ {0.333, 0.667, 0.500},
304
+ {0.333, 1.000, 0.500},
305
+ {0.667, 0.000, 0.500},
306
+ {0.667, 0.333, 0.500},
307
+ {0.667, 0.667, 0.500},
308
+ {0.667, 1.000, 0.500},
309
+ {1.000, 0.000, 0.500},
310
+ {1.000, 0.333, 0.500},
311
+ {1.000, 0.667, 0.500},
312
+ {1.000, 1.000, 0.500},
313
+ {0.000, 0.333, 1.000},
314
+ {0.000, 0.667, 1.000},
315
+ {0.000, 1.000, 1.000},
316
+ {0.333, 0.000, 1.000},
317
+ {0.333, 0.333, 1.000},
318
+ {0.333, 0.667, 1.000},
319
+ {0.333, 1.000, 1.000},
320
+ {0.667, 0.000, 1.000},
321
+ {0.667, 0.333, 1.000},
322
+ {0.667, 0.667, 1.000},
323
+ {0.667, 1.000, 1.000},
324
+ {1.000, 0.000, 1.000},
325
+ {1.000, 0.333, 1.000},
326
+ {1.000, 0.667, 1.000},
327
+ {0.333, 0.000, 0.000},
328
+ {0.500, 0.000, 0.000},
329
+ {0.667, 0.000, 0.000},
330
+ {0.833, 0.000, 0.000},
331
+ {1.000, 0.000, 0.000},
332
+ {0.000, 0.167, 0.000},
333
+ {0.000, 0.333, 0.000},
334
+ {0.000, 0.500, 0.000},
335
+ {0.000, 0.667, 0.000},
336
+ {0.000, 0.833, 0.000},
337
+ {0.000, 1.000, 0.000},
338
+ {0.000, 0.000, 0.167},
339
+ {0.000, 0.000, 0.333},
340
+ {0.000, 0.000, 0.500},
341
+ {0.000, 0.000, 0.667},
342
+ {0.000, 0.000, 0.833},
343
+ {0.000, 0.000, 1.000},
344
+ {0.000, 0.000, 0.000},
345
+ {0.143, 0.143, 0.143},
346
+ {0.286, 0.286, 0.286},
347
+ {0.429, 0.429, 0.429},
348
+ {0.571, 0.571, 0.571},
349
+ {0.714, 0.714, 0.714},
350
+ {0.857, 0.857, 0.857},
351
+ {0.000, 0.447, 0.741},
352
+ {0.314, 0.717, 0.741},
353
+ {0.50, 0.5, 0}
354
+ };
355
+
356
+ static void draw_objects(const cv::Mat& bgr, const std::vector<Object>& objects)
357
+ {
358
+ static const char* class_names[] = {
359
+ "person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light",
360
+ "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow",
361
+ "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee",
362
+ "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard",
363
+ "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple",
364
+ "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch",
365
+ "potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone",
366
+ "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear",
367
+ "hair drier", "toothbrush"
368
+ };
369
+
370
+ cv::Mat image = bgr.clone();
371
+
372
+ for (size_t i = 0; i < objects.size(); i++)
373
+ {
374
+ const Object& obj = objects[i];
375
+
376
+ fprintf(stderr, "%d = %.5f at %.2f %.2f %.2f x %.2f\n", obj.label, obj.prob,
377
+ obj.rect.x, obj.rect.y, obj.rect.width, obj.rect.height);
378
+
379
+ cv::Scalar color = cv::Scalar(color_list[obj.label][0], color_list[obj.label][1], color_list[obj.label][2]);
380
+ float c_mean = cv::mean(color)[0];
381
+ cv::Scalar txt_color;
382
+ if (c_mean > 0.5){
383
+ txt_color = cv::Scalar(0, 0, 0);
384
+ }else{
385
+ txt_color = cv::Scalar(255, 255, 255);
386
+ }
387
+
388
+ cv::rectangle(image, obj.rect, color * 255, 2);
389
+
390
+ char text[256];
391
+ sprintf(text, "%s %.1f%%", class_names[obj.label], obj.prob * 100);
392
+
393
+ int baseLine = 0;
394
+ cv::Size label_size = cv::getTextSize(text, cv::FONT_HERSHEY_COMPLEX, 0.4, 1, &baseLine);
395
+
396
+ cv::Scalar txt_bk_color = color * 0.7 * 255;
397
+
398
+ int x = obj.rect.x;
399
+ int y = obj.rect.y + 1;
400
+ //int y = obj.rect.y - label_size.height - baseLine;
401
+ if (y > image.rows)
402
+ y = image.rows;
403
+ //if (x + label_size.width > image.cols)
404
+ //x = image.cols - label_size.width;
405
+
406
+ cv::rectangle(image, cv::Rect(cv::Point(x, y), cv::Size(label_size.width, label_size.height + baseLine)),
407
+ txt_bk_color, -1);
408
+
409
+ cv::putText(image, text, cv::Point(x, y + label_size.height),
410
+ cv::FONT_HERSHEY_COMPLEX, 0.4, txt_color, 1);
411
+ }
412
+
413
+ cv::imwrite("_demo.jpg" , image);
414
+ fprintf(stderr, "save vis file\n");
415
+ /* cv::imshow("image", image); */
416
+ /* cv::waitKey(0); */
417
+ }
418
+
419
+
420
+ int main(int argc, char* argv[]) {
421
+ try {
422
+ // ------------------------------ Parsing and validation of input arguments
423
+ // ---------------------------------
424
+ if (argc != 4) {
425
+ tcout << "Usage : " << argv[0] << " <path_to_model> <path_to_image> <device_name>" << std::endl;
426
+ return EXIT_FAILURE;
427
+ }
428
+
429
+ const file_name_t input_model {argv[1]};
430
+ const file_name_t input_image_path {argv[2]};
431
+ const std::string device_name {argv[3]};
432
+ // -----------------------------------------------------------------------------------------------------
433
+
434
+ // --------------------------- Step 1. Initialize inference engine core
435
+ // -------------------------------------
436
+ Core ie;
437
+ // -----------------------------------------------------------------------------------------------------
438
+
439
+ // Step 2. Read a model in OpenVINO Intermediate Representation (.xml and
440
+ // .bin files) or ONNX (.onnx file) format
441
+ CNNNetwork network = ie.ReadNetwork(input_model);
442
+ if (network.getOutputsInfo().size() != 1)
443
+ throw std::logic_error("Sample supports topologies with 1 output only");
444
+ if (network.getInputsInfo().size() != 1)
445
+ throw std::logic_error("Sample supports topologies with 1 input only");
446
+ // -----------------------------------------------------------------------------------------------------
447
+
448
+ // --------------------------- Step 3. Configure input & output
449
+ // ---------------------------------------------
450
+ // --------------------------- Prepare input blobs
451
+ // -----------------------------------------------------
452
+ InputInfo::Ptr input_info = network.getInputsInfo().begin()->second;
453
+ std::string input_name = network.getInputsInfo().begin()->first;
454
+
455
+ /* Mark input as resizable by setting of a resize algorithm.
456
+ * In this case we will be able to set an input blob of any shape to an
457
+ * infer request. Resize and layout conversions are executed automatically
458
+ * during inference */
459
+ //input_info->getPreProcess().setResizeAlgorithm(RESIZE_BILINEAR);
460
+ //input_info->setLayout(Layout::NHWC);
461
+ //input_info->setPrecision(Precision::FP32);
462
+
463
+ // --------------------------- Prepare output blobs
464
+ // ----------------------------------------------------
465
+ if (network.getOutputsInfo().empty()) {
466
+ std::cerr << "Network outputs info is empty" << std::endl;
467
+ return EXIT_FAILURE;
468
+ }
469
+ DataPtr output_info = network.getOutputsInfo().begin()->second;
470
+ std::string output_name = network.getOutputsInfo().begin()->first;
471
+
472
+ output_info->setPrecision(Precision::FP32);
473
+ // -----------------------------------------------------------------------------------------------------
474
+
475
+ // --------------------------- Step 4. Loading a model to the device
476
+ // ------------------------------------------
477
+ ExecutableNetwork executable_network = ie.LoadNetwork(network, device_name);
478
+ // -----------------------------------------------------------------------------------------------------
479
+
480
+ // --------------------------- Step 5. Create an infer request
481
+ // -------------------------------------------------
482
+ InferRequest infer_request = executable_network.CreateInferRequest();
483
+ // -----------------------------------------------------------------------------------------------------
484
+
485
+ // --------------------------- Step 6. Prepare input
486
+ // --------------------------------------------------------
487
+ /* Read input image to a blob and set it to an infer request without resize
488
+ * and layout conversions. */
489
+ cv::Mat image = imread_t(input_image_path);
490
+ cv::Mat pr_img = static_resize(image);
491
+ Blob::Ptr imgBlob = infer_request.GetBlob(input_name); // just wrap Mat data by Blob::Ptr
492
+ blobFromImage(pr_img, imgBlob);
493
+
494
+ // infer_request.SetBlob(input_name, imgBlob); // infer_request accepts input blob of any size
495
+ // -----------------------------------------------------------------------------------------------------
496
+
497
+ // --------------------------- Step 7. Do inference
498
+ // --------------------------------------------------------
499
+ /* Running the request synchronously */
500
+ infer_request.Infer();
501
+ // -----------------------------------------------------------------------------------------------------
502
+
503
+ // --------------------------- Step 8. Process output
504
+ // ------------------------------------------------------
505
+ const Blob::Ptr output_blob = infer_request.GetBlob(output_name);
506
+ MemoryBlob::CPtr moutput = as<MemoryBlob>(output_blob);
507
+ if (!moutput) {
508
+ throw std::logic_error("We expect output to be inherited from MemoryBlob, "
509
+ "but by fact we were not able to cast output to MemoryBlob");
510
+ }
511
+ // locked memory holder should be alive all time while access to its buffer
512
+ // happens
513
+ auto moutputHolder = moutput->rmap();
514
+ const float* net_pred = moutputHolder.as<const PrecisionTrait<Precision::FP32>::value_type*>();
515
+
516
+ const int image_size = 416;
517
+ int img_w = image.cols;
518
+ int img_h = image.rows;
519
+ float scale = std::min(INPUT_W / (image.cols*1.0), INPUT_H / (image.rows*1.0));
520
+ std::vector<Object> objects;
521
+
522
+ decode_outputs(net_pred, objects, scale, img_w, img_h);
523
+ draw_objects(image, objects);
524
+
525
+ // -----------------------------------------------------------------------------------------------------
526
+ } catch (const std::exception& ex) {
527
+ std::cerr << ex.what() << std::endl;
528
+ return EXIT_FAILURE;
529
+ }
530
+ return EXIT_SUCCESS;
531
+ }
demo/OpenVINO/python/README.md ADDED
@@ -0,0 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # User Guide for Deploy YOLOX on OpenVINO
2
+
3
+ This toturial includes a Python demo for OpenVINO, as well as some converted models.
4
+
5
+ ### Download OpenVINO models.
6
+ | Model | Parameters | GFLOPs | Test Size | mAP |
7
+ |:------| :----: | :----: | :---: | :---: |
8
+ | [YOLOX-Nano](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.res101.fpn.coco.800size.1x) | 0.91M | 1.08 | 416x416 | 25.3 |
9
+ | [YOLOX-Tiny](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.fpn.coco.800size.1x) | 5.06M | 6.45 | 416x416 |31.7 |
10
+ | [YOLOX-S](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 9.0M | 26.8 | 640x640 |39.6 |
11
+ | [YOLOX-M](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 25.3M | 73.8 | 640x640 |46.4 |
12
+ | [YOLOX-L](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 54.2M | 155.6 | 640x640 |50.0 |
13
+ | [YOLOX-X](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 99.1M | 281.9 | 640x640 |51.2 |
14
+ | [YOLOX-Darknet53](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 63.72M | 185.3 | 640x640 |47.3 |
15
+
16
+ ## Install OpenVINO Toolkit
17
+
18
+ Please visit [Openvino Homepage](https://docs.openvinotoolkit.org/latest/get_started_guides.html) for more details.
19
+
20
+ ## Set up the Environment
21
+
22
+ ### For Linux
23
+
24
+ **Option1. Set up the environment tempororally. You need to run this command everytime you start a new shell window.**
25
+
26
+ ```shell
27
+ source /opt/intel/openvino_2021/bin/setupvars.sh
28
+ ```
29
+
30
+ **Option2. Set up the environment permenantly.**
31
+
32
+ *Step1.* For Linux:
33
+ ```shell
34
+ vim ~/.bashrc
35
+ ```
36
+
37
+ *Step2.* Add the following line into your file:
38
+
39
+ ```shell
40
+ source /opt/intel/openvino_2021/bin/setupvars.sh
41
+ ```
42
+
43
+ *Step3.* Save and exit the file, then run:
44
+
45
+ ```shell
46
+ source ~/.bashrc
47
+ ```
48
+
49
+
50
+ ## Convert model
51
+
52
+ 1. Export ONNX model
53
+
54
+ Please refer to the [ONNX toturial]() for more details. **Note that you should set --opset to 10, otherwise your next step will fail.**
55
+
56
+ 2. Convert ONNX to OpenVINO
57
+
58
+ ``` shell
59
+ cd <INSTSLL_DIR>/openvino_2021/deployment_tools/model_optimizer
60
+ ```
61
+
62
+ Install requirements for convert tool
63
+
64
+ ```shell
65
+ sudo ./install_prerequisites/install_prerequisites_onnx.sh
66
+ ```
67
+
68
+ Then convert model.
69
+ ```shell
70
+ python3 mo.py --input_model <ONNX_MODEL> --input_shape <INPUT_SHAPE> [--data_type FP16]
71
+ ```
72
+ For example:
73
+ ```shell
74
+ python3 mo.py --input_model yolox.onnx --input_shape (1,3,640,640) --data_type FP16
75
+ ```
76
+
77
+ ## Demo
78
+
79
+ ### python
80
+
81
+ ```shell
82
+ python openvino_inference.py -m <XML_MODEL_PATH> -i <IMAGE_PATH>
83
+ ```
84
+ or
85
+ ```shell
86
+ python openvino_inference.py -m <XML_MODEL_PATH> -i <IMAGE_PATH> -o <OUTPUT_DIR> -s <SCORE_THR> -d <DEVICE>
87
+ ```
88
+
demo/OpenVINO/python/demo_utils.py ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+
3
+ import os
4
+
5
+
6
+ def mkdir(path):
7
+ if not os.path.exists(path):
8
+ os.makedirs(path)
9
+
10
+
11
+ def nms(boxes, scores, nms_thr):
12
+ """Single class NMS implemented in Numpy."""
13
+ x1 = boxes[:, 0]
14
+ y1 = boxes[:, 1]
15
+ x2 = boxes[:, 2]
16
+ y2 = boxes[:, 3]
17
+
18
+ areas = (x2 - x1 + 1) * (y2 - y1 + 1)
19
+ order = scores.argsort()[::-1]
20
+
21
+ keep = []
22
+ while order.size > 0:
23
+ i = order[0]
24
+ keep.append(i)
25
+ xx1 = np.maximum(x1[i], x1[order[1:]])
26
+ yy1 = np.maximum(y1[i], y1[order[1:]])
27
+ xx2 = np.minimum(x2[i], x2[order[1:]])
28
+ yy2 = np.minimum(y2[i], y2[order[1:]])
29
+
30
+ w = np.maximum(0.0, xx2 - xx1 + 1)
31
+ h = np.maximum(0.0, yy2 - yy1 + 1)
32
+ inter = w * h
33
+ ovr = inter / (areas[i] + areas[order[1:]] - inter)
34
+
35
+ inds = np.where(ovr <= nms_thr)[0]
36
+ order = order[inds + 1]
37
+
38
+ return keep
39
+
40
+
41
+ def multiclass_nms(boxes, scores, nms_thr, score_thr):
42
+ """Multiclass NMS implemented in Numpy"""
43
+ final_dets = []
44
+ num_classes = scores.shape[1]
45
+ for cls_ind in range(num_classes):
46
+ cls_scores = scores[:, cls_ind]
47
+ valid_score_mask = cls_scores > score_thr
48
+ if valid_score_mask.sum() == 0:
49
+ continue
50
+ else:
51
+ valid_scores = cls_scores[valid_score_mask]
52
+ valid_boxes = boxes[valid_score_mask]
53
+ keep = nms(valid_boxes, valid_scores, nms_thr)
54
+ if len(keep) > 0:
55
+ cls_inds = np.ones((len(keep), 1)) * cls_ind
56
+ dets = np.concatenate([valid_boxes[keep], valid_scores[keep, None], cls_inds], 1)
57
+ final_dets.append(dets)
58
+ return np.concatenate(final_dets, 0)
59
+
60
+
61
+ def postprocess(outputs, img_size, p6=False):
62
+
63
+ grids = []
64
+ expanded_strides = []
65
+
66
+ if not p6:
67
+ strides = [8, 16, 32]
68
+ else:
69
+ strides = [8, 16, 32, 64]
70
+
71
+ hsizes = [img_size[0]//stride for stride in strides]
72
+ wsizes = [img_size[1]//stride for stride in strides]
73
+
74
+ for hsize, wsize, stride in zip(hsizes, wsizes, strides):
75
+ xv, yv = np.meshgrid(np.arange(hsize), np.arange(wsize))
76
+ grid = np.stack((xv, yv), 2).reshape(1, -1, 2)
77
+ grids.append(grid)
78
+ shape = grid.shape[:2]
79
+ expanded_strides.append(np.full((*shape, 1), stride))
80
+
81
+ grids = np.concatenate(grids, 1)
82
+ expanded_strides = np.concatenate(expanded_strides, 1)
83
+ outputs[..., :2] = (outputs[..., :2] + grids) * expanded_strides
84
+ outputs[..., 2:4] = np.exp(outputs[..., 2:4]) * expanded_strides
85
+
86
+ return outputs
demo/OpenVINO/python/openvino_inference.py ADDED
@@ -0,0 +1,155 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ # -*- coding: utf-8 -*-
3
+ # Copyright (C) 2018-2021 Intel Corporation
4
+ # SPDX-License-Identifier: Apache-2.0
5
+ import argparse
6
+ import logging as log
7
+ import os
8
+ import sys
9
+
10
+ import cv2
11
+ import numpy as np
12
+
13
+ from demo_utils import mkdir, multiclass_nms, postprocess
14
+ from openvino.inference_engine import IECore
15
+ from yolox.data.data_augment import preproc as preprocess
16
+ from yolox.data.datasets import COCO_CLASSES
17
+ from yolox.utils.visualize import vis
18
+
19
+
20
+ def parse_args() -> argparse.Namespace:
21
+ """Parse and return command line arguments"""
22
+ parser = argparse.ArgumentParser(add_help=False)
23
+ args = parser.add_argument_group('Options')
24
+ args.add_argument(
25
+ '-h',
26
+ '--help',
27
+ action='help',
28
+ help='Show this help message and exit.')
29
+ args.add_argument(
30
+ '-m',
31
+ '--model',
32
+ required=True,
33
+ type=str,
34
+ help='Required. Path to an .xml or .onnx file with a trained model.')
35
+ args.add_argument(
36
+ '-i',
37
+ '--input',
38
+ required=True,
39
+ type=str,
40
+ help='Required. Path to an image file.')
41
+ args.add_argument(
42
+ '-o',
43
+ '--output_dir',
44
+ type=str,
45
+ default='demo_output',
46
+ help='Path to your output dir.')
47
+ args.add_argument(
48
+ '-s',
49
+ '--score_thr',
50
+ type=float,
51
+ default=0.3,
52
+ help="Score threshould to visualize the result.")
53
+ args.add_argument(
54
+ '-d',
55
+ '--device',
56
+ default='CPU',
57
+ type=str,
58
+ help='Optional. Specify the target device to infer on; CPU, GPU, \
59
+ MYRIAD, HDDL or HETERO: is acceptable. The sample will look \
60
+ for a suitable plugin for device specified. Default value \
61
+ is CPU.')
62
+ args.add_argument(
63
+ '--labels',
64
+ default=None,
65
+ type=str,
66
+ help='Option:al. Path to a labels mapping file.')
67
+ args.add_argument(
68
+ '-nt',
69
+ '--number_top',
70
+ default=10,
71
+ type=int,
72
+ help='Optional. Number of top results.')
73
+ return parser.parse_args()
74
+
75
+
76
+ def main():
77
+ log.basicConfig(format='[ %(levelname)s ] %(message)s', level=log.INFO, stream=sys.stdout)
78
+ args = parse_args()
79
+
80
+ # ---------------------------Step 1. Initialize inference engine core--------------------------------------------------
81
+ log.info('Creating Inference Engine')
82
+ ie = IECore()
83
+
84
+ # ---------------------------Step 2. Read a model in OpenVINO Intermediate Representation or ONNX format---------------
85
+ log.info(f'Reading the network: {args.model}')
86
+ # (.xml and .bin files) or (.onnx file)
87
+ net = ie.read_network(model=args.model)
88
+
89
+ if len(net.input_info) != 1:
90
+ log.error('Sample supports only single input topologies')
91
+ return -1
92
+ if len(net.outputs) != 1:
93
+ log.error('Sample supports only single output topologies')
94
+ return -1
95
+
96
+ # ---------------------------Step 3. Configure input & output----------------------------------------------------------
97
+ log.info('Configuring input and output blobs')
98
+ # Get names of input and output blobs
99
+ input_blob = next(iter(net.input_info))
100
+ out_blob = next(iter(net.outputs))
101
+
102
+ # Set input and output precision manually
103
+ net.input_info[input_blob].precision = 'FP32'
104
+ net.outputs[out_blob].precision = 'FP16'
105
+
106
+ # Get a number of classes recognized by a model
107
+ num_of_classes = max(net.outputs[out_blob].shape)
108
+
109
+ # ---------------------------Step 4. Loading model to the device-------------------------------------------------------
110
+ log.info('Loading the model to the plugin')
111
+ exec_net = ie.load_network(network=net, device_name=args.device)
112
+
113
+ # ---------------------------Step 5. Create infer request--------------------------------------------------------------
114
+ # load_network() method of the IECore class with a specified number of requests (default 1) returns an ExecutableNetwork
115
+ # instance which stores infer requests. So you already created Infer requests in the previous step.
116
+
117
+ # ---------------------------Step 6. Prepare input---------------------------------------------------------------------
118
+ origin_img = cv2.imread(args.input)
119
+ _, _, h, w = net.input_info[input_blob].input_data.shape
120
+ mean = (0.485, 0.456, 0.406)
121
+ std = (0.229, 0.224, 0.225)
122
+ image, ratio = preprocess(origin_img, (h, w), mean, std)
123
+
124
+ # ---------------------------Step 7. Do inference----------------------------------------------------------------------
125
+ log.info('Starting inference in synchronous mode')
126
+ res = exec_net.infer(inputs={input_blob: image})
127
+
128
+ # ---------------------------Step 8. Process output--------------------------------------------------------------------
129
+ res = res[out_blob]
130
+
131
+ predictions = postprocess(res, (h, w), p6=False)[0]
132
+
133
+ boxes = predictions[:, :4]
134
+ scores = predictions[:, 4, None] * predictions[:, 5:]
135
+
136
+ boxes_xyxy = np.ones_like(boxes)
137
+ boxes_xyxy[:, 0] = boxes[:, 0] - boxes[:, 2]/2.
138
+ boxes_xyxy[:, 1] = boxes[:, 1] - boxes[:, 3]/2.
139
+ boxes_xyxy[:, 2] = boxes[:, 0] + boxes[:, 2]/2.
140
+ boxes_xyxy[:, 3] = boxes[:, 1] + boxes[:, 3]/2.
141
+ boxes_xyxy /= ratio
142
+ dets = multiclass_nms(boxes_xyxy, scores, nms_thr=0.65, score_thr=0.1)
143
+
144
+ final_boxes = dets[:, :4]
145
+ final_scores, final_cls_inds = dets[:, 4], dets[:, 5]
146
+ origin_img = vis(origin_img, final_boxes, final_scores, final_cls_inds,
147
+ conf=args.score_thr, class_names=COCO_CLASSES)
148
+
149
+ mkdir(args.output_dir)
150
+ output_path = os.path.join(args.output_dir, args.image_path.split("/")[-1])
151
+ cv2.imwrite(output_path, origin_img)
152
+
153
+
154
+ if __name__ == '__main__':
155
+ sys.exit(main())
demo/TensorRT/cpp/CMakeLists.txt ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ cmake_minimum_required(VERSION 2.6)
2
+
3
+ project(yolox)
4
+
5
+ add_definitions(-std=c++11)
6
+
7
+ option(CUDA_USE_STATIC_CUDA_RUNTIME OFF)
8
+ set(CMAKE_CXX_STANDARD 11)
9
+ set(CMAKE_BUILD_TYPE Debug)
10
+
11
+ find_package(CUDA REQUIRED)
12
+
13
+ include_directories(${PROJECT_SOURCE_DIR}/include)
14
+ # include and link dirs of cuda and tensorrt, you need adapt them if yours are different
15
+ # cuda
16
+ include_directories(/data/cuda/cuda-10.2/cuda/include)
17
+ link_directories(/data/cuda/cuda-10.2/cuda/lib64)
18
+ # cudnn
19
+ include_directories(/data/cuda/cuda-10.2/cudnn/v8.0.4/include)
20
+ link_directories(/data/cuda/cuda-10.2/cudnn/v8.0.4/lib64)
21
+ # tensorrt
22
+ include_directories(/data/cuda/cuda-10.2/TensorRT/v7.2.1.6/include)
23
+ link_directories(/data/cuda/cuda-10.2/TensorRT/v7.2.1.6/lib)
24
+
25
+ set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11 -Wall -Ofast -Wfatal-errors -D_MWAITXINTRIN_H_INCLUDED")
26
+
27
+ find_package(OpenCV)
28
+ include_directories(${OpenCV_INCLUDE_DIRS})
29
+
30
+ add_executable(yolox ${PROJECT_SOURCE_DIR}/yolox.cpp)
31
+ target_link_libraries(yolox nvinfer)
32
+ target_link_libraries(yolox cudart)
33
+ target_link_libraries(yolox ${OpenCV_LIBS})
34
+
35
+ add_definitions(-O2 -pthread)
36
+
demo/TensorRT/cpp/README.md ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # User Guide for Deploy YOLOX on TensorRT C++
2
+
3
+ As YOLOX models is easy to converted to tensorrt using [torch2trt gitrepo](https://github.com/NVIDIA-AI-IOT/torch2trt),
4
+ our C++ demo will not include the model converting or constructing like other tenorrt demos.
5
+
6
+
7
+ ## Step 1: Prepare serialized engine file
8
+
9
+ Follow the trt [python demo README](../Python/README.md) to convert and save the serialized engine file.
10
+
11
+
12
+ ## Step 2: build the demo
13
+
14
+ Please follow the [TensorRT Installation Guide](https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html) to install TensorRT.
15
+
16
+ Install opencv with ```sudo apt-get install libopencv-dev```.
17
+
18
+ build the demo:
19
+
20
+ ```shell
21
+ mkdir build
22
+ cd build
23
+ cmake ..
24
+ make
25
+ ```
26
+
27
+ Move the 'model_trt.engine' file generated from Step 1 (saved at the exp output dir) to the build dir:
28
+
29
+ ```shell
30
+ mv /path/to/your/exp/output/dir/model_trt.engine .
31
+ ```
32
+
33
+ Then run the demo:
34
+
35
+ ```shell
36
+ ./yolox -d /your/path/to/yolox/assets
37
+ ```
38
+
39
+ or
40
+
41
+ ```shell
42
+ ./yolox -d <img dir>
43
+ ```
demo/TensorRT/cpp/logging.h ADDED
@@ -0,0 +1,503 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /*
2
+ * Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
3
+ *
4
+ * Licensed under the Apache License, Version 2.0 (the "License");
5
+ * you may not use this file except in compliance with the License.
6
+ * You may obtain a copy of the License at
7
+ *
8
+ * http://www.apache.org/licenses/LICENSE-2.0
9
+ *
10
+ * Unless required by applicable law or agreed to in writing, software
11
+ * distributed under the License is distributed on an "AS IS" BASIS,
12
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13
+ * See the License for the specific language governing permissions and
14
+ * limitations under the License.
15
+ */
16
+
17
+ #ifndef TENSORRT_LOGGING_H
18
+ #define TENSORRT_LOGGING_H
19
+
20
+ #include "NvInferRuntimeCommon.h"
21
+ #include <cassert>
22
+ #include <ctime>
23
+ #include <iomanip>
24
+ #include <iostream>
25
+ #include <ostream>
26
+ #include <sstream>
27
+ #include <string>
28
+
29
+ using Severity = nvinfer1::ILogger::Severity;
30
+
31
+ class LogStreamConsumerBuffer : public std::stringbuf
32
+ {
33
+ public:
34
+ LogStreamConsumerBuffer(std::ostream& stream, const std::string& prefix, bool shouldLog)
35
+ : mOutput(stream)
36
+ , mPrefix(prefix)
37
+ , mShouldLog(shouldLog)
38
+ {
39
+ }
40
+
41
+ LogStreamConsumerBuffer(LogStreamConsumerBuffer&& other)
42
+ : mOutput(other.mOutput)
43
+ {
44
+ }
45
+
46
+ ~LogStreamConsumerBuffer()
47
+ {
48
+ // std::streambuf::pbase() gives a pointer to the beginning of the buffered part of the output sequence
49
+ // std::streambuf::pptr() gives a pointer to the current position of the output sequence
50
+ // if the pointer to the beginning is not equal to the pointer to the current position,
51
+ // call putOutput() to log the output to the stream
52
+ if (pbase() != pptr())
53
+ {
54
+ putOutput();
55
+ }
56
+ }
57
+
58
+ // synchronizes the stream buffer and returns 0 on success
59
+ // synchronizing the stream buffer consists of inserting the buffer contents into the stream,
60
+ // resetting the buffer and flushing the stream
61
+ virtual int sync()
62
+ {
63
+ putOutput();
64
+ return 0;
65
+ }
66
+
67
+ void putOutput()
68
+ {
69
+ if (mShouldLog)
70
+ {
71
+ // prepend timestamp
72
+ std::time_t timestamp = std::time(nullptr);
73
+ tm* tm_local = std::localtime(&timestamp);
74
+ std::cout << "[";
75
+ std::cout << std::setw(2) << std::setfill('0') << 1 + tm_local->tm_mon << "/";
76
+ std::cout << std::setw(2) << std::setfill('0') << tm_local->tm_mday << "/";
77
+ std::cout << std::setw(4) << std::setfill('0') << 1900 + tm_local->tm_year << "-";
78
+ std::cout << std::setw(2) << std::setfill('0') << tm_local->tm_hour << ":";
79
+ std::cout << std::setw(2) << std::setfill('0') << tm_local->tm_min << ":";
80
+ std::cout << std::setw(2) << std::setfill('0') << tm_local->tm_sec << "] ";
81
+ // std::stringbuf::str() gets the string contents of the buffer
82
+ // insert the buffer contents pre-appended by the appropriate prefix into the stream
83
+ mOutput << mPrefix << str();
84
+ // set the buffer to empty
85
+ str("");
86
+ // flush the stream
87
+ mOutput.flush();
88
+ }
89
+ }
90
+
91
+ void setShouldLog(bool shouldLog)
92
+ {
93
+ mShouldLog = shouldLog;
94
+ }
95
+
96
+ private:
97
+ std::ostream& mOutput;
98
+ std::string mPrefix;
99
+ bool mShouldLog;
100
+ };
101
+
102
+ //!
103
+ //! \class LogStreamConsumerBase
104
+ //! \brief Convenience object used to initialize LogStreamConsumerBuffer before std::ostream in LogStreamConsumer
105
+ //!
106
+ class LogStreamConsumerBase
107
+ {
108
+ public:
109
+ LogStreamConsumerBase(std::ostream& stream, const std::string& prefix, bool shouldLog)
110
+ : mBuffer(stream, prefix, shouldLog)
111
+ {
112
+ }
113
+
114
+ protected:
115
+ LogStreamConsumerBuffer mBuffer;
116
+ };
117
+
118
+ //!
119
+ //! \class LogStreamConsumer
120
+ //! \brief Convenience object used to facilitate use of C++ stream syntax when logging messages.
121
+ //! Order of base classes is LogStreamConsumerBase and then std::ostream.
122
+ //! This is because the LogStreamConsumerBase class is used to initialize the LogStreamConsumerBuffer member field
123
+ //! in LogStreamConsumer and then the address of the buffer is passed to std::ostream.
124
+ //! This is necessary to prevent the address of an uninitialized buffer from being passed to std::ostream.
125
+ //! Please do not change the order of the parent classes.
126
+ //!
127
+ class LogStreamConsumer : protected LogStreamConsumerBase, public std::ostream
128
+ {
129
+ public:
130
+ //! \brief Creates a LogStreamConsumer which logs messages with level severity.
131
+ //! Reportable severity determines if the messages are severe enough to be logged.
132
+ LogStreamConsumer(Severity reportableSeverity, Severity severity)
133
+ : LogStreamConsumerBase(severityOstream(severity), severityPrefix(severity), severity <= reportableSeverity)
134
+ , std::ostream(&mBuffer) // links the stream buffer with the stream
135
+ , mShouldLog(severity <= reportableSeverity)
136
+ , mSeverity(severity)
137
+ {
138
+ }
139
+
140
+ LogStreamConsumer(LogStreamConsumer&& other)
141
+ : LogStreamConsumerBase(severityOstream(other.mSeverity), severityPrefix(other.mSeverity), other.mShouldLog)
142
+ , std::ostream(&mBuffer) // links the stream buffer with the stream
143
+ , mShouldLog(other.mShouldLog)
144
+ , mSeverity(other.mSeverity)
145
+ {
146
+ }
147
+
148
+ void setReportableSeverity(Severity reportableSeverity)
149
+ {
150
+ mShouldLog = mSeverity <= reportableSeverity;
151
+ mBuffer.setShouldLog(mShouldLog);
152
+ }
153
+
154
+ private:
155
+ static std::ostream& severityOstream(Severity severity)
156
+ {
157
+ return severity >= Severity::kINFO ? std::cout : std::cerr;
158
+ }
159
+
160
+ static std::string severityPrefix(Severity severity)
161
+ {
162
+ switch (severity)
163
+ {
164
+ case Severity::kINTERNAL_ERROR: return "[F] ";
165
+ case Severity::kERROR: return "[E] ";
166
+ case Severity::kWARNING: return "[W] ";
167
+ case Severity::kINFO: return "[I] ";
168
+ case Severity::kVERBOSE: return "[V] ";
169
+ default: assert(0); return "";
170
+ }
171
+ }
172
+
173
+ bool mShouldLog;
174
+ Severity mSeverity;
175
+ };
176
+
177
+ //! \class Logger
178
+ //!
179
+ //! \brief Class which manages logging of TensorRT tools and samples
180
+ //!
181
+ //! \details This class provides a common interface for TensorRT tools and samples to log information to the console,
182
+ //! and supports logging two types of messages:
183
+ //!
184
+ //! - Debugging messages with an associated severity (info, warning, error, or internal error/fatal)
185
+ //! - Test pass/fail messages
186
+ //!
187
+ //! The advantage of having all samples use this class for logging as opposed to emitting directly to stdout/stderr is
188
+ //! that the logic for controlling the verbosity and formatting of sample output is centralized in one location.
189
+ //!
190
+ //! In the future, this class could be extended to support dumping test results to a file in some standard format
191
+ //! (for example, JUnit XML), and providing additional metadata (e.g. timing the duration of a test run).
192
+ //!
193
+ //! TODO: For backwards compatibility with existing samples, this class inherits directly from the nvinfer1::ILogger
194
+ //! interface, which is problematic since there isn't a clean separation between messages coming from the TensorRT
195
+ //! library and messages coming from the sample.
196
+ //!
197
+ //! In the future (once all samples are updated to use Logger::getTRTLogger() to access the ILogger) we can refactor the
198
+ //! class to eliminate the inheritance and instead make the nvinfer1::ILogger implementation a member of the Logger
199
+ //! object.
200
+
201
+ class Logger : public nvinfer1::ILogger
202
+ {
203
+ public:
204
+ Logger(Severity severity = Severity::kWARNING)
205
+ : mReportableSeverity(severity)
206
+ {
207
+ }
208
+
209
+ //!
210
+ //! \enum TestResult
211
+ //! \brief Represents the state of a given test
212
+ //!
213
+ enum class TestResult
214
+ {
215
+ kRUNNING, //!< The test is running
216
+ kPASSED, //!< The test passed
217
+ kFAILED, //!< The test failed
218
+ kWAIVED //!< The test was waived
219
+ };
220
+
221
+ //!
222
+ //! \brief Forward-compatible method for retrieving the nvinfer::ILogger associated with this Logger
223
+ //! \return The nvinfer1::ILogger associated with this Logger
224
+ //!
225
+ //! TODO Once all samples are updated to use this method to register the logger with TensorRT,
226
+ //! we can eliminate the inheritance of Logger from ILogger
227
+ //!
228
+ nvinfer1::ILogger& getTRTLogger()
229
+ {
230
+ return *this;
231
+ }
232
+
233
+ //!
234
+ //! \brief Implementation of the nvinfer1::ILogger::log() virtual method
235
+ //!
236
+ //! Note samples should not be calling this function directly; it will eventually go away once we eliminate the
237
+ //! inheritance from nvinfer1::ILogger
238
+ //!
239
+ void log(Severity severity, const char* msg) override
240
+ {
241
+ LogStreamConsumer(mReportableSeverity, severity) << "[TRT] " << std::string(msg) << std::endl;
242
+ }
243
+
244
+ //!
245
+ //! \brief Method for controlling the verbosity of logging output
246
+ //!
247
+ //! \param severity The logger will only emit messages that have severity of this level or higher.
248
+ //!
249
+ void setReportableSeverity(Severity severity)
250
+ {
251
+ mReportableSeverity = severity;
252
+ }
253
+
254
+ //!
255
+ //! \brief Opaque handle that holds logging information for a particular test
256
+ //!
257
+ //! This object is an opaque handle to information used by the Logger to print test results.
258
+ //! The sample must call Logger::defineTest() in order to obtain a TestAtom that can be used
259
+ //! with Logger::reportTest{Start,End}().
260
+ //!
261
+ class TestAtom
262
+ {
263
+ public:
264
+ TestAtom(TestAtom&&) = default;
265
+
266
+ private:
267
+ friend class Logger;
268
+
269
+ TestAtom(bool started, const std::string& name, const std::string& cmdline)
270
+ : mStarted(started)
271
+ , mName(name)
272
+ , mCmdline(cmdline)
273
+ {
274
+ }
275
+
276
+ bool mStarted;
277
+ std::string mName;
278
+ std::string mCmdline;
279
+ };
280
+
281
+ //!
282
+ //! \brief Define a test for logging
283
+ //!
284
+ //! \param[in] name The name of the test. This should be a string starting with
285
+ //! "TensorRT" and containing dot-separated strings containing
286
+ //! the characters [A-Za-z0-9_].
287
+ //! For example, "TensorRT.sample_googlenet"
288
+ //! \param[in] cmdline The command line used to reproduce the test
289
+ //
290
+ //! \return a TestAtom that can be used in Logger::reportTest{Start,End}().
291
+ //!
292
+ static TestAtom defineTest(const std::string& name, const std::string& cmdline)
293
+ {
294
+ return TestAtom(false, name, cmdline);
295
+ }
296
+
297
+ //!
298
+ //! \brief A convenience overloaded version of defineTest() that accepts an array of command-line arguments
299
+ //! as input
300
+ //!
301
+ //! \param[in] name The name of the test
302
+ //! \param[in] argc The number of command-line arguments
303
+ //! \param[in] argv The array of command-line arguments (given as C strings)
304
+ //!
305
+ //! \return a TestAtom that can be used in Logger::reportTest{Start,End}().
306
+ static TestAtom defineTest(const std::string& name, int argc, char const* const* argv)
307
+ {
308
+ auto cmdline = genCmdlineString(argc, argv);
309
+ return defineTest(name, cmdline);
310
+ }
311
+
312
+ //!
313
+ //! \brief Report that a test has started.
314
+ //!
315
+ //! \pre reportTestStart() has not been called yet for the given testAtom
316
+ //!
317
+ //! \param[in] testAtom The handle to the test that has started
318
+ //!
319
+ static void reportTestStart(TestAtom& testAtom)
320
+ {
321
+ reportTestResult(testAtom, TestResult::kRUNNING);
322
+ assert(!testAtom.mStarted);
323
+ testAtom.mStarted = true;
324
+ }
325
+
326
+ //!
327
+ //! \brief Report that a test has ended.
328
+ //!
329
+ //! \pre reportTestStart() has been called for the given testAtom
330
+ //!
331
+ //! \param[in] testAtom The handle to the test that has ended
332
+ //! \param[in] result The result of the test. Should be one of TestResult::kPASSED,
333
+ //! TestResult::kFAILED, TestResult::kWAIVED
334
+ //!
335
+ static void reportTestEnd(const TestAtom& testAtom, TestResult result)
336
+ {
337
+ assert(result != TestResult::kRUNNING);
338
+ assert(testAtom.mStarted);
339
+ reportTestResult(testAtom, result);
340
+ }
341
+
342
+ static int reportPass(const TestAtom& testAtom)
343
+ {
344
+ reportTestEnd(testAtom, TestResult::kPASSED);
345
+ return EXIT_SUCCESS;
346
+ }
347
+
348
+ static int reportFail(const TestAtom& testAtom)
349
+ {
350
+ reportTestEnd(testAtom, TestResult::kFAILED);
351
+ return EXIT_FAILURE;
352
+ }
353
+
354
+ static int reportWaive(const TestAtom& testAtom)
355
+ {
356
+ reportTestEnd(testAtom, TestResult::kWAIVED);
357
+ return EXIT_SUCCESS;
358
+ }
359
+
360
+ static int reportTest(const TestAtom& testAtom, bool pass)
361
+ {
362
+ return pass ? reportPass(testAtom) : reportFail(testAtom);
363
+ }
364
+
365
+ Severity getReportableSeverity() const
366
+ {
367
+ return mReportableSeverity;
368
+ }
369
+
370
+ private:
371
+ //!
372
+ //! \brief returns an appropriate string for prefixing a log message with the given severity
373
+ //!
374
+ static const char* severityPrefix(Severity severity)
375
+ {
376
+ switch (severity)
377
+ {
378
+ case Severity::kINTERNAL_ERROR: return "[F] ";
379
+ case Severity::kERROR: return "[E] ";
380
+ case Severity::kWARNING: return "[W] ";
381
+ case Severity::kINFO: return "[I] ";
382
+ case Severity::kVERBOSE: return "[V] ";
383
+ default: assert(0); return "";
384
+ }
385
+ }
386
+
387
+ //!
388
+ //! \brief returns an appropriate string for prefixing a test result message with the given result
389
+ //!
390
+ static const char* testResultString(TestResult result)
391
+ {
392
+ switch (result)
393
+ {
394
+ case TestResult::kRUNNING: return "RUNNING";
395
+ case TestResult::kPASSED: return "PASSED";
396
+ case TestResult::kFAILED: return "FAILED";
397
+ case TestResult::kWAIVED: return "WAIVED";
398
+ default: assert(0); return "";
399
+ }
400
+ }
401
+
402
+ //!
403
+ //! \brief returns an appropriate output stream (cout or cerr) to use with the given severity
404
+ //!
405
+ static std::ostream& severityOstream(Severity severity)
406
+ {
407
+ return severity >= Severity::kINFO ? std::cout : std::cerr;
408
+ }
409
+
410
+ //!
411
+ //! \brief method that implements logging test results
412
+ //!
413
+ static void reportTestResult(const TestAtom& testAtom, TestResult result)
414
+ {
415
+ severityOstream(Severity::kINFO) << "&&&& " << testResultString(result) << " " << testAtom.mName << " # "
416
+ << testAtom.mCmdline << std::endl;
417
+ }
418
+
419
+ //!
420
+ //! \brief generate a command line string from the given (argc, argv) values
421
+ //!
422
+ static std::string genCmdlineString(int argc, char const* const* argv)
423
+ {
424
+ std::stringstream ss;
425
+ for (int i = 0; i < argc; i++)
426
+ {
427
+ if (i > 0)
428
+ ss << " ";
429
+ ss << argv[i];
430
+ }
431
+ return ss.str();
432
+ }
433
+
434
+ Severity mReportableSeverity;
435
+ };
436
+
437
+ namespace
438
+ {
439
+
440
+ //!
441
+ //! \brief produces a LogStreamConsumer object that can be used to log messages of severity kVERBOSE
442
+ //!
443
+ //! Example usage:
444
+ //!
445
+ //! LOG_VERBOSE(logger) << "hello world" << std::endl;
446
+ //!
447
+ inline LogStreamConsumer LOG_VERBOSE(const Logger& logger)
448
+ {
449
+ return LogStreamConsumer(logger.getReportableSeverity(), Severity::kVERBOSE);
450
+ }
451
+
452
+ //!
453
+ //! \brief produces a LogStreamConsumer object that can be used to log messages of severity kINFO
454
+ //!
455
+ //! Example usage:
456
+ //!
457
+ //! LOG_INFO(logger) << "hello world" << std::endl;
458
+ //!
459
+ inline LogStreamConsumer LOG_INFO(const Logger& logger)
460
+ {
461
+ return LogStreamConsumer(logger.getReportableSeverity(), Severity::kINFO);
462
+ }
463
+
464
+ //!
465
+ //! \brief produces a LogStreamConsumer object that can be used to log messages of severity kWARNING
466
+ //!
467
+ //! Example usage:
468
+ //!
469
+ //! LOG_WARN(logger) << "hello world" << std::endl;
470
+ //!
471
+ inline LogStreamConsumer LOG_WARN(const Logger& logger)
472
+ {
473
+ return LogStreamConsumer(logger.getReportableSeverity(), Severity::kWARNING);
474
+ }
475
+
476
+ //!
477
+ //! \brief produces a LogStreamConsumer object that can be used to log messages of severity kERROR
478
+ //!
479
+ //! Example usage:
480
+ //!
481
+ //! LOG_ERROR(logger) << "hello world" << std::endl;
482
+ //!
483
+ inline LogStreamConsumer LOG_ERROR(const Logger& logger)
484
+ {
485
+ return LogStreamConsumer(logger.getReportableSeverity(), Severity::kERROR);
486
+ }
487
+
488
+ //!
489
+ //! \brief produces a LogStreamConsumer object that can be used to log messages of severity kINTERNAL_ERROR
490
+ // ("fatal" severity)
491
+ //!
492
+ //! Example usage:
493
+ //!
494
+ //! LOG_FATAL(logger) << "hello world" << std::endl;
495
+ //!
496
+ inline LogStreamConsumer LOG_FATAL(const Logger& logger)
497
+ {
498
+ return LogStreamConsumer(logger.getReportableSeverity(), Severity::kINTERNAL_ERROR);
499
+ }
500
+
501
+ } // anonymous namespace
502
+
503
+ #endif // TENSORRT_LOGGING_H
demo/TensorRT/cpp/yolox.cpp ADDED
@@ -0,0 +1,554 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #include <fstream>
2
+ #include <iostream>
3
+ #include <sstream>
4
+ #include <numeric>
5
+ #include <chrono>
6
+ #include <vector>
7
+ #include <opencv2/opencv.hpp>
8
+ #include <dirent.h>
9
+ #include "NvInfer.h"
10
+ #include "cuda_runtime_api.h"
11
+ #include "logging.h"
12
+
13
+ #define CHECK(status) \
14
+ do\
15
+ {\
16
+ auto ret = (status);\
17
+ if (ret != 0)\
18
+ {\
19
+ std::cerr << "Cuda failure: " << ret << std::endl;\
20
+ abort();\
21
+ }\
22
+ } while (0)
23
+
24
+ #define DEVICE 0 // GPU id
25
+ #define NMS_THRESH 0.65
26
+ #define BBOX_CONF_THRESH 0.3
27
+
28
+ using namespace nvinfer1;
29
+
30
+ // stuff we know about the network and the input/output blobs
31
+ static const int INPUT_W = 640;
32
+ static const int INPUT_H = 640;
33
+ const char* INPUT_BLOB_NAME = "input_0";
34
+ const char* OUTPUT_BLOB_NAME = "output_0";
35
+ static Logger gLogger;
36
+
37
+ cv::Mat static_resize(cv::Mat& img) {
38
+ float r = std::min(INPUT_W / (img.cols*1.0), INPUT_H / (img.rows*1.0));
39
+ // r = std::min(r, 1.0f);
40
+ int unpad_w = r * img.cols;
41
+ int unpad_h = r * img.rows;
42
+ cv::Mat re(unpad_h, unpad_w, CV_8UC3);
43
+ cv::resize(img, re, re.size());
44
+ cv::Mat out(INPUT_W, INPUT_H, CV_8UC3, cv::Scalar(114, 114, 114));
45
+ re.copyTo(out(cv::Rect(0, 0, re.cols, re.rows)));
46
+ return out;
47
+ }
48
+
49
+ struct Object
50
+ {
51
+ cv::Rect_<float> rect;
52
+ int label;
53
+ float prob;
54
+ };
55
+
56
+ struct GridAndStride
57
+ {
58
+ int grid0;
59
+ int grid1;
60
+ int stride;
61
+ };
62
+
63
+ static int generate_grids_and_stride(const int target_size, std::vector<int>& strides, std::vector<GridAndStride>& grid_strides)
64
+ {
65
+ for (auto stride : strides)
66
+ {
67
+ int num_grid = target_size / stride;
68
+ for (int g1 = 0; g1 < num_grid; g1++)
69
+ {
70
+ for (int g0 = 0; g0 < num_grid; g0++)
71
+ {
72
+ grid_strides.push_back((GridAndStride){g0, g1, stride});
73
+ }
74
+ }
75
+ }
76
+ }
77
+
78
+ static inline float intersection_area(const Object& a, const Object& b)
79
+ {
80
+ cv::Rect_<float> inter = a.rect & b.rect;
81
+ return inter.area();
82
+ }
83
+
84
+ static void qsort_descent_inplace(std::vector<Object>& faceobjects, int left, int right)
85
+ {
86
+ int i = left;
87
+ int j = right;
88
+ float p = faceobjects[(left + right) / 2].prob;
89
+
90
+ while (i <= j)
91
+ {
92
+ while (faceobjects[i].prob > p)
93
+ i++;
94
+
95
+ while (faceobjects[j].prob < p)
96
+ j--;
97
+
98
+ if (i <= j)
99
+ {
100
+ // swap
101
+ std::swap(faceobjects[i], faceobjects[j]);
102
+
103
+ i++;
104
+ j--;
105
+ }
106
+ }
107
+
108
+ #pragma omp parallel sections
109
+ {
110
+ #pragma omp section
111
+ {
112
+ if (left < j) qsort_descent_inplace(faceobjects, left, j);
113
+ }
114
+ #pragma omp section
115
+ {
116
+ if (i < right) qsort_descent_inplace(faceobjects, i, right);
117
+ }
118
+ }
119
+ }
120
+
121
+ static void qsort_descent_inplace(std::vector<Object>& objects)
122
+ {
123
+ if (objects.empty())
124
+ return;
125
+
126
+ qsort_descent_inplace(objects, 0, objects.size() - 1);
127
+ }
128
+
129
+ static void nms_sorted_bboxes(const std::vector<Object>& faceobjects, std::vector<int>& picked, float nms_threshold)
130
+ {
131
+ picked.clear();
132
+
133
+ const int n = faceobjects.size();
134
+
135
+ std::vector<float> areas(n);
136
+ for (int i = 0; i < n; i++)
137
+ {
138
+ areas[i] = faceobjects[i].rect.area();
139
+ }
140
+
141
+ for (int i = 0; i < n; i++)
142
+ {
143
+ const Object& a = faceobjects[i];
144
+
145
+ int keep = 1;
146
+ for (int j = 0; j < (int)picked.size(); j++)
147
+ {
148
+ const Object& b = faceobjects[picked[j]];
149
+
150
+ // intersection over union
151
+ float inter_area = intersection_area(a, b);
152
+ float union_area = areas[i] + areas[picked[j]] - inter_area;
153
+ // float IoU = inter_area / union_area
154
+ if (inter_area / union_area > nms_threshold)
155
+ keep = 0;
156
+ }
157
+
158
+ if (keep)
159
+ picked.push_back(i);
160
+ }
161
+ }
162
+
163
+
164
+ static void generate_yolox_proposals(std::vector<GridAndStride> grid_strides, float* feat_blob, float prob_threshold, std::vector<Object>& objects)
165
+ {
166
+ const int num_class = 80;
167
+
168
+ const int num_anchors = grid_strides.size();
169
+
170
+ for (int anchor_idx = 0; anchor_idx < num_anchors; anchor_idx++)
171
+ {
172
+ const int grid0 = grid_strides[anchor_idx].grid0;
173
+ const int grid1 = grid_strides[anchor_idx].grid1;
174
+ const int stride = grid_strides[anchor_idx].stride;
175
+
176
+ const int basic_pos = anchor_idx * 85;
177
+
178
+ // yolox/models/yolo_head.py decode logic
179
+ float x_center = (feat_blob[basic_pos+0] + grid0) * stride;
180
+ float y_center = (feat_blob[basic_pos+1] + grid1) * stride;
181
+ float w = exp(feat_blob[basic_pos+2]) * stride;
182
+ float h = exp(feat_blob[basic_pos+3]) * stride;
183
+ float x0 = x_center - w * 0.5f;
184
+ float y0 = y_center - h * 0.5f;
185
+
186
+ float box_objectness = feat_blob[basic_pos+4];
187
+ for (int class_idx = 0; class_idx < num_class; class_idx++)
188
+ {
189
+ float box_cls_score = feat_blob[basic_pos + 5 + class_idx];
190
+ float box_prob = box_objectness * box_cls_score;
191
+ if (box_prob > prob_threshold)
192
+ {
193
+ Object obj;
194
+ obj.rect.x = x0;
195
+ obj.rect.y = y0;
196
+ obj.rect.width = w;
197
+ obj.rect.height = h;
198
+ obj.label = class_idx;
199
+ obj.prob = box_prob;
200
+
201
+ objects.push_back(obj);
202
+ }
203
+
204
+ } // class loop
205
+
206
+ } // point anchor loop
207
+ }
208
+
209
+ float* blobFromImage(cv::Mat& img){
210
+ cv::cvtColor(img, img, cv::COLOR_BGR2RGB);
211
+
212
+ float* blob = new float[img.total()*3];
213
+ int channels = 3;
214
+ int img_h = 640;
215
+ int img_w = 640;
216
+ std::vector<float> mean = {0.485, 0.456, 0.406};
217
+ std::vector<float> std = {0.229, 0.224, 0.225};
218
+ for (size_t c = 0; c < channels; c++)
219
+ {
220
+ for (size_t h = 0; h < img_h; h++)
221
+ {
222
+ for (size_t w = 0; w < img_w; w++)
223
+ {
224
+ blob[c * img_w * img_h + h * img_w + w] =
225
+ (((float)img.at<cv::Vec3b>(h, w)[c]) / 255.0f - mean[c]) / std[c];
226
+ }
227
+ }
228
+ }
229
+ return blob;
230
+ }
231
+
232
+
233
+ int read_files_in_dir(const char *p_dir_name, std::vector<std::string> &file_names) {
234
+ DIR *p_dir = opendir(p_dir_name);
235
+ if (p_dir == nullptr) {
236
+ return -1;
237
+ }
238
+
239
+ struct dirent* p_file = nullptr;
240
+ while ((p_file = readdir(p_dir)) != nullptr) {
241
+ if (strcmp(p_file->d_name, ".") != 0 &&
242
+ strcmp(p_file->d_name, "..") != 0) {
243
+ std::string cur_file_name(p_file->d_name);
244
+ file_names.push_back(cur_file_name);
245
+ }
246
+ }
247
+
248
+ closedir(p_dir);
249
+ return 0;
250
+ }
251
+
252
+ static void decode_outputs(float* prob, std::vector<Object>& objects, float scale, const int img_w, const int img_h) {
253
+ std::vector<Object> proposals;
254
+ std::vector<int> strides = {8, 16, 32};
255
+ std::vector<GridAndStride> grid_strides;
256
+ generate_grids_and_stride(INPUT_W, strides, grid_strides);
257
+ generate_yolox_proposals(grid_strides, prob, BBOX_CONF_THRESH, proposals);
258
+ std::cout << "num of boxes before nms: " << proposals.size() << std::endl;
259
+
260
+ qsort_descent_inplace(proposals);
261
+
262
+ std::vector<int> picked;
263
+ nms_sorted_bboxes(proposals, picked, NMS_THRESH);
264
+
265
+
266
+ int count = picked.size();
267
+
268
+ std::cout << "num of boxes: " << count << std::endl;
269
+
270
+ objects.resize(count);
271
+ for (int i = 0; i < count; i++)
272
+ {
273
+ objects[i] = proposals[picked[i]];
274
+
275
+ // adjust offset to original unpadded
276
+ float x0 = (objects[i].rect.x) / scale;
277
+ float y0 = (objects[i].rect.y) / scale;
278
+ float x1 = (objects[i].rect.x + objects[i].rect.width) / scale;
279
+ float y1 = (objects[i].rect.y + objects[i].rect.height) / scale;
280
+
281
+ // clip
282
+ x0 = std::max(std::min(x0, (float)(img_w - 1)), 0.f);
283
+ y0 = std::max(std::min(y0, (float)(img_h - 1)), 0.f);
284
+ x1 = std::max(std::min(x1, (float)(img_w - 1)), 0.f);
285
+ y1 = std::max(std::min(y1, (float)(img_h - 1)), 0.f);
286
+
287
+ objects[i].rect.x = x0;
288
+ objects[i].rect.y = y0;
289
+ objects[i].rect.width = x1 - x0;
290
+ objects[i].rect.height = y1 - y0;
291
+ }
292
+ }
293
+
294
+ const float color_list[80][3] =
295
+ {
296
+ {0.000, 0.447, 0.741},
297
+ {0.850, 0.325, 0.098},
298
+ {0.929, 0.694, 0.125},
299
+ {0.494, 0.184, 0.556},
300
+ {0.466, 0.674, 0.188},
301
+ {0.301, 0.745, 0.933},
302
+ {0.635, 0.078, 0.184},
303
+ {0.300, 0.300, 0.300},
304
+ {0.600, 0.600, 0.600},
305
+ {1.000, 0.000, 0.000},
306
+ {1.000, 0.500, 0.000},
307
+ {0.749, 0.749, 0.000},
308
+ {0.000, 1.000, 0.000},
309
+ {0.000, 0.000, 1.000},
310
+ {0.667, 0.000, 1.000},
311
+ {0.333, 0.333, 0.000},
312
+ {0.333, 0.667, 0.000},
313
+ {0.333, 1.000, 0.000},
314
+ {0.667, 0.333, 0.000},
315
+ {0.667, 0.667, 0.000},
316
+ {0.667, 1.000, 0.000},
317
+ {1.000, 0.333, 0.000},
318
+ {1.000, 0.667, 0.000},
319
+ {1.000, 1.000, 0.000},
320
+ {0.000, 0.333, 0.500},
321
+ {0.000, 0.667, 0.500},
322
+ {0.000, 1.000, 0.500},
323
+ {0.333, 0.000, 0.500},
324
+ {0.333, 0.333, 0.500},
325
+ {0.333, 0.667, 0.500},
326
+ {0.333, 1.000, 0.500},
327
+ {0.667, 0.000, 0.500},
328
+ {0.667, 0.333, 0.500},
329
+ {0.667, 0.667, 0.500},
330
+ {0.667, 1.000, 0.500},
331
+ {1.000, 0.000, 0.500},
332
+ {1.000, 0.333, 0.500},
333
+ {1.000, 0.667, 0.500},
334
+ {1.000, 1.000, 0.500},
335
+ {0.000, 0.333, 1.000},
336
+ {0.000, 0.667, 1.000},
337
+ {0.000, 1.000, 1.000},
338
+ {0.333, 0.000, 1.000},
339
+ {0.333, 0.333, 1.000},
340
+ {0.333, 0.667, 1.000},
341
+ {0.333, 1.000, 1.000},
342
+ {0.667, 0.000, 1.000},
343
+ {0.667, 0.333, 1.000},
344
+ {0.667, 0.667, 1.000},
345
+ {0.667, 1.000, 1.000},
346
+ {1.000, 0.000, 1.000},
347
+ {1.000, 0.333, 1.000},
348
+ {1.000, 0.667, 1.000},
349
+ {0.333, 0.000, 0.000},
350
+ {0.500, 0.000, 0.000},
351
+ {0.667, 0.000, 0.000},
352
+ {0.833, 0.000, 0.000},
353
+ {1.000, 0.000, 0.000},
354
+ {0.000, 0.167, 0.000},
355
+ {0.000, 0.333, 0.000},
356
+ {0.000, 0.500, 0.000},
357
+ {0.000, 0.667, 0.000},
358
+ {0.000, 0.833, 0.000},
359
+ {0.000, 1.000, 0.000},
360
+ {0.000, 0.000, 0.167},
361
+ {0.000, 0.000, 0.333},
362
+ {0.000, 0.000, 0.500},
363
+ {0.000, 0.000, 0.667},
364
+ {0.000, 0.000, 0.833},
365
+ {0.000, 0.000, 1.000},
366
+ {0.000, 0.000, 0.000},
367
+ {0.143, 0.143, 0.143},
368
+ {0.286, 0.286, 0.286},
369
+ {0.429, 0.429, 0.429},
370
+ {0.571, 0.571, 0.571},
371
+ {0.714, 0.714, 0.714},
372
+ {0.857, 0.857, 0.857},
373
+ {0.000, 0.447, 0.741},
374
+ {0.314, 0.717, 0.741},
375
+ {0.50, 0.5, 0}
376
+ };
377
+
378
+ static void draw_objects(const cv::Mat& bgr, const std::vector<Object>& objects, std::string f)
379
+ {
380
+ static const char* class_names[] = {
381
+ "person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light",
382
+ "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow",
383
+ "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee",
384
+ "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard",
385
+ "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple",
386
+ "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch",
387
+ "potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone",
388
+ "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear",
389
+ "hair drier", "toothbrush"
390
+ };
391
+
392
+ cv::Mat image = bgr.clone();
393
+
394
+ for (size_t i = 0; i < objects.size(); i++)
395
+ {
396
+ const Object& obj = objects[i];
397
+
398
+ fprintf(stderr, "%d = %.5f at %.2f %.2f %.2f x %.2f\n", obj.label, obj.prob,
399
+ obj.rect.x, obj.rect.y, obj.rect.width, obj.rect.height);
400
+
401
+ cv::Scalar color = cv::Scalar(color_list[obj.label][0], color_list[obj.label][1], color_list[obj.label][2]);
402
+ float c_mean = cv::mean(color)[0];
403
+ cv::Scalar txt_color;
404
+ if (c_mean > 0.5){
405
+ txt_color = cv::Scalar(0, 0, 0);
406
+ }else{
407
+ txt_color = cv::Scalar(255, 255, 255);
408
+ }
409
+
410
+ cv::rectangle(image, obj.rect, color * 255, 2);
411
+
412
+ char text[256];
413
+ sprintf(text, "%s %.1f%%", class_names[obj.label], obj.prob * 100);
414
+
415
+ int baseLine = 0;
416
+ cv::Size label_size = cv::getTextSize(text, cv::FONT_HERSHEY_COMPLEX, 0.4, 1, &baseLine);
417
+
418
+ cv::Scalar txt_bk_color = color * 0.7 * 255;
419
+
420
+ int x = obj.rect.x;
421
+ int y = obj.rect.y + 1;
422
+ //int y = obj.rect.y - label_size.height - baseLine;
423
+ if (y > image.rows)
424
+ y = image.rows;
425
+ //if (x + label_size.width > image.cols)
426
+ //x = image.cols - label_size.width;
427
+
428
+ cv::rectangle(image, cv::Rect(cv::Point(x, y), cv::Size(label_size.width, label_size.height + baseLine)),
429
+ txt_bk_color, -1);
430
+
431
+ cv::putText(image, text, cv::Point(x, y + label_size.height),
432
+ cv::FONT_HERSHEY_COMPLEX, 0.4, txt_color, 1);
433
+ }
434
+
435
+ cv::imwrite("_" + f, image);
436
+ fprintf(stderr, "save vis file\n");
437
+ /* cv::imshow("image", image); */
438
+ /* cv::waitKey(0); */
439
+ }
440
+
441
+
442
+ void doInference(IExecutionContext& context, float* input, float* output, const int output_size, cv::Size input_shape) {
443
+ const ICudaEngine& engine = context.getEngine();
444
+
445
+ // Pointers to input and output device buffers to pass to engine.
446
+ // Engine requires exactly IEngine::getNbBindings() number of buffers.
447
+ assert(engine.getNbBindings() == 2);
448
+ void* buffers[2];
449
+
450
+ // In order to bind the buffers, we need to know the names of the input and output tensors.
451
+ // Note that indices are guaranteed to be less than IEngine::getNbBindings()
452
+ const int inputIndex = engine.getBindingIndex(INPUT_BLOB_NAME);
453
+
454
+ assert(engine.getBindingDataType(inputIndex) == nvinfer1::DataType::kFLOAT);
455
+ const int outputIndex = engine.getBindingIndex(OUTPUT_BLOB_NAME);
456
+ assert(engine.getBindingDataType(outputIndex) == nvinfer1::DataType::kFLOAT);
457
+ int mBatchSize = engine.getMaxBatchSize();
458
+
459
+ // Create GPU buffers on device
460
+ CHECK(cudaMalloc(&buffers[inputIndex], 3 * input_shape.height * input_shape.width * sizeof(float)));
461
+ CHECK(cudaMalloc(&buffers[outputIndex], output_size*sizeof(float)));
462
+
463
+ // Create stream
464
+ cudaStream_t stream;
465
+ CHECK(cudaStreamCreate(&stream));
466
+
467
+ // DMA input batch data to device, infer on the batch asynchronously, and DMA output back to host
468
+ CHECK(cudaMemcpyAsync(buffers[inputIndex], input, 3 * input_shape.height * input_shape.width * sizeof(float), cudaMemcpyHostToDevice, stream));
469
+ context.enqueue(1, buffers, stream, nullptr);
470
+ CHECK(cudaMemcpyAsync(output, buffers[outputIndex], output_size * sizeof(float), cudaMemcpyDeviceToHost, stream));
471
+ cudaStreamSynchronize(stream);
472
+
473
+ // Release stream and buffers
474
+ cudaStreamDestroy(stream);
475
+ CHECK(cudaFree(buffers[inputIndex]));
476
+ CHECK(cudaFree(buffers[outputIndex]));
477
+ }
478
+
479
+ int main(int argc, char** argv) {
480
+ cudaSetDevice(DEVICE);
481
+ // create a model using the API directly and serialize it to a stream
482
+ char *trtModelStream{nullptr};
483
+ size_t size{0};
484
+
485
+ if (argc == 3 && std::string(argv[1]) == "-d") {
486
+ std::ifstream file("model_trt.engine", std::ios::binary);
487
+ if (file.good()) {
488
+ file.seekg(0, file.end);
489
+ size = file.tellg();
490
+ file.seekg(0, file.beg);
491
+ trtModelStream = new char[size];
492
+ assert(trtModelStream);
493
+ file.read(trtModelStream, size);
494
+ file.close();
495
+ }
496
+ } else {
497
+ std::cerr << "arguments not right!" << std::endl;
498
+ std::cerr << "run 'python3 yolox/deploy/trt.py -n yolox-{tiny, s, m, l, x}' to serialize model first!" << std::endl;
499
+ std::cerr << "./yolox -d ../samples // deserialize file and run inference" << std::endl;
500
+ return -1;
501
+ }
502
+
503
+ std::vector<std::string> file_names;
504
+ if (read_files_in_dir(argv[2], file_names) < 0) {
505
+ std::cout << "read_files_in_dir failed." << std::endl;
506
+ return -1;
507
+ }
508
+
509
+ IRuntime* runtime = createInferRuntime(gLogger);
510
+ assert(runtime != nullptr);
511
+ ICudaEngine* engine = runtime->deserializeCudaEngine(trtModelStream, size);
512
+ assert(engine != nullptr);
513
+ IExecutionContext* context = engine->createExecutionContext();
514
+ assert(context != nullptr);
515
+ delete[] trtModelStream;
516
+ auto out_dims = engine->getBindingDimensions(1);
517
+ auto output_size = 1;
518
+ for(int j=0;j<out_dims.nbDims;j++) {
519
+ output_size *= out_dims.d[j];
520
+ }
521
+ static float* prob = new float[output_size];
522
+
523
+ int fcount = 0;
524
+ for (auto f: file_names) {
525
+ fcount++;
526
+ std::cout << fcount << " " << f << std::endl;
527
+ cv::Mat img = cv::imread(std::string(argv[2]) + "/" + f);
528
+ if (img.empty()) continue;
529
+ int img_w = img.cols;
530
+ int img_h = img.rows;
531
+ cv::Mat pr_img = static_resize(img);
532
+ std::cout << "blob image" << std::endl;
533
+
534
+ float* blob;
535
+ blob = blobFromImage(pr_img);
536
+ float scale = std::min(INPUT_W / (img.cols*1.0), INPUT_H / (img.rows*1.0));
537
+
538
+ // Run inference
539
+ auto start = std::chrono::system_clock::now();
540
+ doInference(*context, blob, prob, output_size, pr_img.size());
541
+ auto end = std::chrono::system_clock::now();
542
+ std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << "ms" << std::endl;
543
+
544
+ std::vector<Object> objects;
545
+ decode_outputs(prob, objects, scale, img_w, img_h);
546
+ draw_objects(img, objects, f);
547
+ }
548
+
549
+ // Destroy the engine
550
+ context->destroy();
551
+ engine->destroy();
552
+ runtime->destroy();
553
+ return 0;
554
+ }
demo/TensorRT/python/README.md ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # User Guide for Deploy YOLOX on TensorRT
2
+
3
+ This toturial includes a Python demo for TensorRT.
4
+
5
+ ## Install TensorRT Toolkit
6
+
7
+ Please follow the [TensorRT Installation Guide](https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html) and [torch2trt gitrepo](https://github.com/NVIDIA-AI-IOT/torch2trt) to install TensorRT and torch2trt.
8
+
9
+ ## Convert model
10
+
11
+ YOLOX models can be easily conveted to TensorRT models using torch2trt
12
+
13
+ If you want to convert our model, use the flag -n to specify a model name:
14
+ ```shell
15
+ python tools/deploy/trt.py -n <YOLOX_MODEL_NAME> -c <YOLOX_CHECKPOINT>
16
+ ```
17
+ For example:
18
+ ```shell
19
+ python tools/deploy/trt.py -n yolox-s -c your_ckpt.pth.tar
20
+ ```
21
+ <YOLOX_MODEL_NAME> can be: yolox-nano, yolox-tiny. yolox-s, yolox-m, yolox-l, yolox-x.
22
+
23
+ If you want to convert your customized model, use the flag -f to specify you exp file:
24
+ ```shell
25
+ python tools/deploy/trt.py -f <YOLOX_EXP_FILE> -c <YOLOX_CHECKPOINT>
26
+ ```
27
+ For example:
28
+ ```shell
29
+ python tools/deploy/trt.py -f /path/to/your/yolox/exps/yolox_s.py -c your_ckpt.pth.tar
30
+ ```
31
+ *yolox_s.py* can be any exp file modified by you.
32
+
33
+ The converted model and the serialized engine file (for C++ demo) will be saved on your experiment output dir.
34
+
35
+ ## Demo
36
+
37
+ The TensorRT python demo is merged on our pytorch demo file, so you can run the pytorch demo command with ```--trt```.
38
+
39
+ ```shell
40
+ python tools/demo.py -n yolox-s --trt --conf 0.3 --nms 0.65 --tsize 640
41
+ ```
42
+ or
43
+ ```shell
44
+ python tools/demo.py -f exps/base/yolox_s.py --trt --conf 0.3 --nms 0.65 --tsize 640
45
+ ```
46
+