Spaces:

tidalove
/

yolox

Sleeping

App Files Files Community

葛政(实习) commited on Jul 18, 2021

Commit

d9f51c5

1 Parent(s): 65998a0

feat(demo): add OpenVINO and ONNXRuntime demo

Browse files

Files changed (16) hide show

README.md +128 -2
demo/ONNXRuntime/README.md +66 -0
demo/ONNXRuntime/demo_utils.py +86 -0
demo/ONNXRuntime/onnx_inference.py +90 -0
demo/OpenVINO/README.md +4 -0
demo/OpenVINO/cpp/CMakeLists.txt +23 -0
demo/OpenVINO/cpp/README.md +94 -0
demo/OpenVINO/cpp/yolox_openvino.cpp +531 -0
demo/OpenVINO/python/README.md +88 -0
demo/OpenVINO/python/demo_utils.py +86 -0
demo/OpenVINO/python/openvino_inference.py +155 -0
demo/TensorRT/cpp/CMakeLists.txt +36 -0
demo/TensorRT/cpp/README.md +43 -0
demo/TensorRT/cpp/logging.h +503 -0
demo/TensorRT/cpp/yolox.cpp +554 -0
demo/TensorRT/python/README.md +46 -0

README.md CHANGED Viewed

@@ -1,2 +1,128 @@
-# YOLOX
-Higher performance and anchor-free YOLO detector. Code of train/test/deploy included.

+<div align="center"><img src="assets/logo.png" width="600"></div>
+<img src="assets/demo.png" >
+## <div align="center">Introduction</div>
+YOLOX is an anchor-free version of YOLO, with a simpler design but better performance! It aims to bridge the gap between research and industrial communities.
+## <div align="center">Why YOLOX?</div>
+<div align="center"><img src="assets/fig1.png" width="400" ><img src="assets/fig2.png" width="400"></div>
+## <div align="center">News!!</div>
+* 【2020/07/19】 We have released our technical report on [Arxiv](xxx)!!
+## <div align="center">Benchmark</div>
+### Standard Models.
+|Model |size |mAP<sup>test<br>0.5:0.95 | Speed V100<br>(ms) | Params<br>(M) |FLOPs<br>(B)| weights |
+| ------        |:---: | :---:       |:---:     |:---:  | :---: | :----: |
+|[YOLOX-s]()    |640  |39.6      |9.8     |9.0 | 26.8 | - |
+|[YOLOX-m]()    |640  |46.4      |12.3     |25.3 |73.8| - |
+|[YOLOX-l]()    |640  |50.0  |14.5 |54.2| 155.6 | - |
+|[YOLOX-x]()   |640  |**51.2**      | 17.3 |99.1 |281.9 | - |
+### Light Models.
+|Model |size |mAP<sup>val<br>0.5:0.95 |  Speed V100<br>(ms) | Params<br>(M) |FLOPs<br>(B)| weights |
+| ------        |:---:  |  :---:       |:---:     |:---:  | :---: | :----: |
+|[YOLOX-Nano]() |416  |25.3  |- | 0.91 |1.08 | - |
+|[YOLOX-Tiny]() |416  |31.7  |- | 5.06 |6.45 | - |
+## <div align="center">Quick Start</div>
+### Installation
+Step1. Install [apex](https://github.com/NVIDIA/apex).
+```shell
+git clone https://github.com/NVIDIA/apex
+cd apex
+pip3 install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
+```
+Step2. Install YOLOX.
+```bash
+$ git clone [email protected]:Megvii-BaseDetection/YOLOX.git
+$ cd yolox
+$ pip3 install -v -e .  # or "python3 setup.py develop
+```
+### Demo
+You can use either -n or -f to specify your detector's config:
+```shell
+python tools/demo.py -n yolox-s -c <MODEL_PATH> --conf 0.3 --nms 0.65 --tsize 640
+```
+or
+```shell
+python tools/demo.py -f exps/base/yolox_s.py -c <MODEL_PATH> --conf 0.3 --nms 0.65 --tsize 640
+```
+<details open>
+<summary>Reproduce our results on COCO</summary>
+Step1.
+* Reproduce our results on COCO by specifying -n:
+```shell
+python tools/train.py -n yolox-s -d 8 -b 64 --fp16 -o
+                         yolox-m
+                         yolox-l
+                         yolox-x
+```
+Notes:
+* -d: number of gpu devices
+* -b: total batch size, the recommended number for -b equals to num_gpu * 8
+* --fp16: mixed precision training
+The above commands are equivalent to:
+```shell
+python tools/train.py -f exps/base/yolox-s.py -d 8 -b 64 --fp16 -o
+                         exps/base/yolox-m.py
+                         exps/base/yolox-l.py
+                         exps/base/yolox-x.py
+```
+* Customize your training.
+* Finetune your datset on COCO pretrained models.
+</details>
+<details open>
+<summary>Evaluation</summary>
+We support batch testing for fast evaluation:
+```shell
+python tools/eval.py -n  yolox-s -b 64 --conf 0.001 --fp16 (optional) --fuse (optional) --test (for test-dev set)
+                         yolox-m
+                         yolox-l
+                         yolox-x
+```
+To reproduce speed test, we use the following command:
+```shell
+python tools/eval.py -n  yolox-s -b 1 -d 0 --conf 0.001 --fp16 --fuse --test (for test-dev set)
+                         yolox-m
+                         yolox-l
+                         yolox-x
+```
+## <div align="center">Deployment</div>
+</details>
+1.  [ONNX: Including ONNX export and an ONNXRuntime demo.]()
+2.  [TensorRT in both C++ and Python]()
+3.  [NCNN in C++]()
+4.  [OpenVINO in both C++ and Python]()
+## <div align="center">Cite Our Work</div>
+If you find this project useful for you, please use the following BibTeX entry.
+TODO

demo/ONNXRuntime/README.md ADDED Viewed

	@@ -0,0 +1,66 @@

+## ONNXRuntime Demo in Python
+This doc introduces how to convert you pytorch model into onnx, and how to run an onnxruntime demo to verify your convertion.
+### Download ONNX models.
+| Model | Parameters | GFLOPs | Test Size | mAP |
+|:------| :----: | :----: | :---: | :---: |
+|  [YOLOX-Nano](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.res101.fpn.coco.800size.1x) |  0.91M  | 1.08 | 416x416 | 25.3 |
+|  [YOLOX-Tiny](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.fpn.coco.800size.1x) | 5.06M     | 6.45 | 416x416 |31.7 |
+|  [YOLOX-S](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 9.0M | 26.8 | 640x640 |39.6 |
+|  [YOLOX-M](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 25.3M | 73.8 | 640x640 |46.4 |
+|  [YOLOX-L](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 54.2M | 155.6 | 640x640 |50.0 |
+|  [YOLOX-X](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 99.1M | 281.9 | 640x640 |51.2 |
+|  [YOLOX-Darknet53](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 63.72M | 185.3 | 640x640 |47.3 |
+### Convert Your Model to ONNX
+First, you should move to <YOLOX_HOME> by:
+```shell
+cd <YOLOX_HOME>
+```
+Then, you can:
+1. Convert a standard YOLOX model by -n:
+```shell
+python3 tools/export_onnx.py --output-name yolox_s.onnx -n yolox-s -c yolox_s.pth.tar
+```
+Notes:
+* -n: specify a model name. The model name must be one of the [yolox-s,m,l,x and yolox-nane, yolox-tiny, yolov3]
+* -c: the model you have trained
+* -o: opset version, default 11. **However, if you will further convert your onnx model to [OpenVINO](), please specify the opset version to 10.**
+* --no-onnxsim: disable onnxsim
+* To customize an input shape for onnx model,  modify the following code in tools/export.py:
+    ```python
+    dummy_input = torch.randn(1, 3, exp.test_size[0], exp.test_size[1])
+    ```
+2. Convert a standard YOLOX model by -f. By using -f, the above command is equivalent to:
+```shell
+python3 tools/export_onnx.py --output-name yolox_s.onnx -f exps/yolox_s.py -c yolox_s.pth.tar
+```
+3. To convert your customized model, please use -f:
+```shell
+python3 tools/export_onnx.py --output-name your_yolox.onnx -f exps/your_yolox.py -c your_yolox.pth.tar
+```
+### ONNXRuntime Demo
+Step1.
+```shell
+cd <YOLOX_HOME>/yolox/deploy/demo_onnxruntime/
+```
+Step2.
+```shell
+python3 onnx_inference.py -m <ONNX_MODEL_PATH> -i <IMAGE_PATH> -o <OUTPUT_DIR> -s 0.3 --input_shape 640,640
+```
+Notes:
+* -m: your converted onnx model
+* -i: input_image
+* -s: score threshold for visualization.
+* --input_shape: should be consistent with the shape you used for onnx convertion.

demo/ONNXRuntime/demo_utils.py ADDED Viewed

	@@ -0,0 +1,86 @@

+import numpy as np
+import os
+def mkdir(path):
+    if not os.path.exists(path):
+        os.makedirs(path)
+def nms(boxes, scores, nms_thr):
+    """Single class NMS implemented in Numpy."""
+    x1 = boxes[:, 0]
+    y1 = boxes[:, 1]
+    x2 = boxes[:, 2]
+    y2 = boxes[:, 3]
+    areas = (x2 - x1 + 1) * (y2 - y1 + 1)
+    order = scores.argsort()[::-1]
+    keep = []
+    while order.size > 0:
+        i = order[0]
+        keep.append(i)
+        xx1 = np.maximum(x1[i], x1[order[1:]])
+        yy1 = np.maximum(y1[i], y1[order[1:]])
+        xx2 = np.minimum(x2[i], x2[order[1:]])
+        yy2 = np.minimum(y2[i], y2[order[1:]])
+        w = np.maximum(0.0, xx2 - xx1 + 1)
+        h = np.maximum(0.0, yy2 - yy1 + 1)
+        inter = w * h
+        ovr = inter / (areas[i] + areas[order[1:]] - inter)
+        inds = np.where(ovr <= nms_thr)[0]
+        order = order[inds + 1]
+    return keep
+def multiclass_nms(boxes, scores, nms_thr, score_thr):
+    """Multiclass NMS implemented in Numpy"""
+    final_dets = []
+    num_classes = scores.shape[1]
+    for cls_ind in range(num_classes):
+        cls_scores = scores[:, cls_ind]
+        valid_score_mask = cls_scores > score_thr
+        if valid_score_mask.sum() == 0:
+            continue
+        else:
+            valid_scores = cls_scores[valid_score_mask]
+            valid_boxes = boxes[valid_score_mask]
+            keep = nms(valid_boxes, valid_scores, nms_thr)
+            if len(keep) > 0:
+                cls_inds = np.ones((len(keep), 1)) * cls_ind
+                dets = np.concatenate([valid_boxes[keep], valid_scores[keep, None], cls_inds], 1)
+                final_dets.append(dets)
+    return np.concatenate(final_dets, 0)
+def postprocess(outputs, img_size, p6=False):
+    grids = []
+    expanded_strides = []
+    if not p6:
+        strides = [8, 16, 32]
+    else:
+        strides = [8, 16, 32, 64]
+    hsizes = [img_size[0]//stride for stride in strides]
+    wsizes = [img_size[1]//stride for stride in strides]
+    for hsize, wsize, stride in zip(hsizes, wsizes, strides):
+        xv, yv = np.meshgrid(np.arange(hsize), np.arange(wsize))
+        grid = np.stack((xv, yv), 2).reshape(1, -1, 2)
+        grids.append(grid)
+        shape = grid.shape[:2]
+        expanded_strides.append(np.full((*shape, 1), stride))
+    grids = np.concatenate(grids, 1)
+    expanded_strides = np.concatenate(expanded_strides, 1)
+    outputs[..., :2] = (outputs[..., :2] + grids) * expanded_strides
+    outputs[..., 2:4] = np.exp(outputs[..., 2:4]) * expanded_strides
+    return outputs

demo/ONNXRuntime/onnx_inference.py ADDED Viewed

	@@ -0,0 +1,90 @@

+import cv2
+import numpy as np
+from yolox.data.data_augment import preproc as preprocess
+from yolox.data.datasets import COCO_CLASSES
+from yolox.utils.visualize import vis
+import argparse
+import onnxruntime
+import os
+from demo_utils import mkdir, multiclass_nms, postprocess
+def make_parser():
+    parser = argparse.ArgumentParser("onnxruntime inference sample")
+    parser.add_argument(
+        "-m",
+        "--model",
+        type=str,
+        default="yolox.onnx",
+        help="Input your onnx model.",
+    )
+    parser.add_argument(
+        "-i",
+        "--image_path",
+        type=str,
+        default='test_image.png',
+        help="Path to your input image.",
+    )
+    parser.add_argument(
+        "-o",
+        "--output_dir",
+        type=str,
+        default='demo_output',
+        help="Path to your output directory.",
+    )
+    parser.add_argument(
+        "-s",
+        "--score_thr",
+        type=float,
+        default=0.3,
+        help="Score threshould to filter the result.",
+    )
+    parser.add_argument(
+        "--input_shape",
+        type=str,
+        default="640,640",
+        help="Specify an input shape for inference.",
+    )
+    parser.add_argument(
+        "--with_p6",
+        action="store_true",
+        help="Whether your model uses p6 in FPN/PAN.",
+    )
+    return parser
+if __name__ == '__main__':
+    args = make_parser().parse_args()
+    input_shape = tuple(map(int, args.input_shape.split(',')))
+    origin_img = cv2.imread(args.image_path)
+    mean = (0.485, 0.456, 0.406)
+    std = (0.229, 0.224, 0.225)
+    img, ratio = preprocess(origin_img, input_shape, mean, std)
+    session = onnxruntime.InferenceSession(args.model)
+    ort_inputs = {session.get_inputs()[0].name: img[None, :, :, :]}
+    output = session.run(None, ort_inputs)
+    predictions = postprocess(output[0], input_shape, p6=args.with_p6)[0]
+    boxes = predictions[:, :4]
+    scores = predictions[:, 4:5] * predictions[:, 5:]
+    boxes_xyxy = np.ones_like(boxes)
+    boxes_xyxy[:, 0] = boxes[:, 0] - boxes[:, 2]/2.
+    boxes_xyxy[:, 1] = boxes[:, 1] - boxes[:, 3]/2.
+    boxes_xyxy[:, 2] = boxes[:, 0] + boxes[:, 2]/2.
+    boxes_xyxy[:, 3] = boxes[:, 1] + boxes[:, 3]/2.
+    boxes_xyxy /= ratio
+    dets = multiclass_nms(boxes_xyxy, scores, nms_thr=0.65, score_thr=0.1)
+    final_boxes, final_scores, final_cls_inds = dets[:, :4], dets[:, 4], dets[:, 5]
+    origin_img = vis(origin_img, final_boxes, final_scores, final_cls_inds,
+                     conf=args.score_thr, class_names=COCO_CLASSES)
+    mkdir(args.output_dir)
+    output_path = os.path.join(args.output_dir, args.image_path.split("/")[-1])
+    cv2.imwrite(output_path, origin_img)

demo/OpenVINO/README.md ADDED Viewed

	@@ -0,0 +1,4 @@

+## YOLOX on OpenVINO
+* [C++ Demo]()
+* [Python Demo]()

demo/OpenVINO/cpp/CMakeLists.txt ADDED Viewed

	@@ -0,0 +1,23 @@

+cmake_minimum_required(VERSION 3.4.1)
+set(CMAKE_CXX_STANDARD 14)
+project(yolox_openvino_demo)
+find_package(OpenCV REQUIRED)
+find_package(InferenceEngine REQUIRED)
+find_package(ngraph REQUIRED)
+include_directories(
+    ${OpenCV_INCLUDE_DIRS}
+    ${CMAKE_CURRENT_SOURCE_DIR}
+    ${CMAKE_CURRENT_BINARY_DIR}
+)
+add_executable(yolox_openvino yolox_openvino.cpp)
+target_link_libraries(
+     yolox_openvino
+    ${InferenceEngine_LIBRARIES}
+    ${NGRAPH_LIBRARIES}
+    ${OpenCV_LIBS}
+)

demo/OpenVINO/cpp/README.md ADDED Viewed

	@@ -0,0 +1,94 @@

+# User Guide for Deploy YOLOX on OpenVINO
+This toturial includes a C++ demo for OpenVINO, as well as some converted models.
+### Download OpenVINO models.
+| Model | Parameters | GFLOPs | Test Size | mAP |
+|:------| :----: | :----: | :---: | :---: |
+|  [YOLOX-Nano](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.res101.fpn.coco.800size.1x) |  0.91M  | 1.08 | 416x416 | 25.3 |
+|  [YOLOX-Tiny](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.fpn.coco.800size.1x) | 5.06M     | 6.45 | 416x416 |31.7 |
+|  [YOLOX-S](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 9.0M | 26.8 | 640x640 |39.6 |
+|  [YOLOX-M](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 25.3M | 73.8 | 640x640 |46.4 |
+|  [YOLOX-L](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 54.2M | 155.6 | 640x640 |50.0 |
+|  [YOLOX-X](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 99.1M | 281.9 | 640x640 |51.2 |
+|  [YOLOX-Darknet53](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 63.72M | 185.3 | 640x640 |47.3 |
+## Install OpenVINO Toolkit
+Please visit [Openvino Homepage](https://docs.openvinotoolkit.org/latest/get_started_guides.html) for more details.
+## Set up the Environment
+### For Linux
+**Option1. Set up the environment tempororally. You need to run this command everytime you start a new shell window.**
+```shell
+source /opt/intel/openvino_2021/bin/setupvars.sh
+```
+**Option2. Set up the environment permenantly.**
+*Step1.* For Linux:
+```shell
+vim ~/.bashrc
+```
+*Step2.* Add the following line into your file:
+```shell
+source /opt/intel/openvino_2021/bin/setupvars.sh
+```
+*Step3.* Save and exit the file, then run:
+```shell
+source ~/.bashrc
+```
+## Convert model
+1. Export ONNX model
+   Please refer to the [ONNX toturial]() for more details. **Note that you should set --opset to 10, otherwise your next step will fail.**
+2. Convert ONNX to OpenVINO
+   ``` shell
+   cd <INSTSLL_DIR>/openvino_2021/deployment_tools/model_optimizer
+   ```
+   Install requirements for convert tool
+   ```shell
+   sudo ./install_prerequisites/install_prerequisites_onnx.sh
+   ```
+   Then convert model.
+   ```shell
+   python3 mo.py --input_model <ONNX_MODEL> --input_shape <INPUT_SHAPE> [--data_type FP16]
+   ```
+   For example:
+   ```shell
+   python3 mo.py --input_model yolox.onnx --input_shape (1,3,640,640) --data_type FP16
+   ```
+## Build
+### Linux
+```shell
+source /opt/intel/openvino_2021/bin/setupvars.sh
+mkdir build
+cd build
+cmake ..
+make
+```
+## Demo
+### c++
+```shell
+./yolox_openvino <XML_MODEL_PATH> <IMAGE_PATH> <DEVICE>
+```

demo/OpenVINO/cpp/yolox_openvino.cpp ADDED Viewed

	@@ -0,0 +1,531 @@

+// Copyright (C) 2018-2021 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+#include <iterator>
+#include <memory>
+#include <string>
+#include <vector>
+#include <opencv2/opencv.hpp>
+#include <iostream>
+#include <inference_engine.hpp>
+using namespace InferenceEngine;
+/**
+ * @brief Define names based depends on Unicode path support
+ */
+#define tcout                  std::cout
+#define file_name_t            std::string
+#define imread_t               cv::imread
+#define NMS_THRESH 0.65
+#define BBOX_CONF_THRESH 0.3
+static const int INPUT_W = 416;
+static const int INPUT_H = 416;
+cv::Mat static_resize(cv::Mat& img) {
+    float r = std::min(INPUT_W / (img.cols*1.0), INPUT_H / (img.rows*1.0));
+    // r = std::min(r, 1.0f);
+    int unpad_w = r * img.cols;
+    int unpad_h = r * img.rows;
+    cv::Mat re(unpad_h, unpad_w, CV_8UC3);
+    cv::resize(img, re, re.size());
+    cv::Mat out(INPUT_W, INPUT_H, CV_8UC3, cv::Scalar(114, 114, 114));
+    re.copyTo(out(cv::Rect(0, 0, re.cols, re.rows)));
+    return out;
+}
+void blobFromImage(cv::Mat& img, Blob::Ptr& blob){
+    cv::cvtColor(img, img, cv::COLOR_BGR2RGB);
+    int channels = 3;
+    int img_h = img.rows;
+    int img_w = img.cols;
+    std::vector<float> mean = {0.485, 0.456, 0.406};
+    std::vector<float> std = {0.229, 0.224, 0.225};
+    InferenceEngine::MemoryBlob::Ptr mblob = InferenceEngine::as<InferenceEngine::MemoryBlob>(blob);
+    if (!mblob)
+    {
+        THROW_IE_EXCEPTION << "We expect blob to be inherited from MemoryBlob in matU8ToBlob, "
+            << "but by fact we were not able to cast inputBlob to MemoryBlob";
+    }
+    // locked memory holder should be alive all time while access to its buffer happens
+    auto mblobHolder = mblob->wmap();
+    float *blob_data = mblobHolder.as<float *>();
+    for (size_t c = 0; c < channels; c++)
+    {
+        for (size_t  h = 0; h < img_h; h++)
+        {
+            for (size_t w = 0; w < img_w; w++)
+            {
+                blob_data[c * img_w * img_h + h * img_w + w] =
+                    (((float)img.at<cv::Vec3b>(h, w)[c]) / 255.0f - mean[c]) / std[c];
+            }
+        }
+    }
+}
+struct Object
+{
+    cv::Rect_<float> rect;
+    int label;
+    float prob;
+};
+struct GridAndStride
+{
+    int grid0;
+    int grid1;
+    int stride;
+};
+static int generate_grids_and_stride(const int target_size, std::vector<int>& strides, std::vector<GridAndStride>& grid_strides)
+{
+    for (auto stride : strides)
+    {
+        int num_grid = target_size / stride;
+        for (int g1 = 0; g1 < num_grid; g1++)
+        {
+            for (int g0 = 0; g0 < num_grid; g0++)
+            {
+                grid_strides.push_back((GridAndStride){g0, g1, stride});
+            }
+        }
+    }
+}
+static void generate_yolox_proposals(std::vector<GridAndStride> grid_strides, const float* feat_ptr, float prob_threshold, std::vector<Object>& objects)
+{
+    const int num_class = 80;  // COCO has 80 classes. Modify this value on your own dataset.
+    const int num_anchors = grid_strides.size();
+    for (int anchor_idx = 0; anchor_idx < num_anchors; anchor_idx++)
+    {
+        const int grid0 = grid_strides[anchor_idx].grid0;
+        const int grid1 = grid_strides[anchor_idx].grid1;
+        const int stride = grid_strides[anchor_idx].stride;
+	const int basic_pos = anchor_idx * 85;
+        // yolox/models/yolo_head.py decode logic
+        //  outputs[..., :2] = (outputs[..., :2] + grids) * strides
+        //  outputs[..., 2:4] = torch.exp(outputs[..., 2:4]) * strides
+        float x_center = (feat_ptr[basic_pos + 0] + grid0) * stride;
+        float y_center = (feat_ptr[basic_pos + 1] + grid1) * stride;
+        float w = exp(feat_ptr[basic_pos + 2]) * stride;
+        float h = exp(feat_ptr[basic_pos + 3]) * stride;
+        float x0 = x_center - w * 0.5f;
+        float y0 = y_center - h * 0.5f;
+        float box_objectness = feat_ptr[basic_pos + 4];
+        for (int class_idx = 0; class_idx < num_class; class_idx++)
+        {
+            float box_cls_score = feat_ptr[basic_pos + 5 + class_idx];
+            float box_prob = box_objectness * box_cls_score;
+            if (box_prob > prob_threshold)
+            {
+                Object obj;
+                obj.rect.x = x0;
+                obj.rect.y = y0;
+                obj.rect.width = w;
+                obj.rect.height = h;
+                obj.label = class_idx;
+                obj.prob = box_prob;
+                objects.push_back(obj);
+            }
+        } // class loop
+    } // point anchor loop
+}
+static inline float intersection_area(const Object& a, const Object& b)
+{
+    cv::Rect_<float> inter = a.rect & b.rect;
+    return inter.area();
+}
+static void qsort_descent_inplace(std::vector<Object>& faceobjects, int left, int right)
+{
+    int i = left;
+    int j = right;
+    float p = faceobjects[(left + right) / 2].prob;
+    while (i <= j)
+    {
+        while (faceobjects[i].prob > p)
+            i++;
+        while (faceobjects[j].prob < p)
+            j--;
+        if (i <= j)
+        {
+            // swap
+            std::swap(faceobjects[i], faceobjects[j]);
+            i++;
+            j--;
+        }
+    }
+    #pragma omp parallel sections
+    {
+        #pragma omp section
+        {
+            if (left < j) qsort_descent_inplace(faceobjects, left, j);
+        }
+        #pragma omp section
+        {
+            if (i < right) qsort_descent_inplace(faceobjects, i, right);
+        }
+    }
+}
+static void qsort_descent_inplace(std::vector<Object>& objects)
+{
+    if (objects.empty())
+        return;
+    qsort_descent_inplace(objects, 0, objects.size() - 1);
+}
+static void nms_sorted_bboxes(const std::vector<Object>& faceobjects, std::vector<int>& picked, float nms_threshold)
+{
+    picked.clear();
+    const int n = faceobjects.size();
+    std::vector<float> areas(n);
+    for (int i = 0; i < n; i++)
+    {
+        areas[i] = faceobjects[i].rect.area();
+    }
+    for (int i = 0; i < n; i++)
+    {
+        const Object& a = faceobjects[i];
+        int keep = 1;
+        for (int j = 0; j < (int)picked.size(); j++)
+        {
+            const Object& b = faceobjects[picked[j]];
+            // intersection over union
+            float inter_area = intersection_area(a, b);
+            float union_area = areas[i] + areas[picked[j]] - inter_area;
+            // float IoU = inter_area / union_area
+            if (inter_area / union_area > nms_threshold)
+                keep = 0;
+        }
+        if (keep)
+            picked.push_back(i);
+    }
+}
+static void decode_outputs(const float* prob, std::vector<Object>& objects, float scale, const int img_w, const int img_h) {
+        std::vector<Object> proposals;
+        std::vector<int> strides = {8, 16, 32};
+        std::vector<GridAndStride> grid_strides;
+        generate_grids_and_stride(INPUT_W, strides, grid_strides);
+        generate_yolox_proposals(grid_strides, prob,  BBOX_CONF_THRESH, proposals);
+        qsort_descent_inplace(proposals);
+        std::vector<int> picked;
+        nms_sorted_bboxes(proposals, picked, NMS_THRESH);
+        int count = picked.size();
+        objects.resize(count);
+        for (int i = 0; i < count; i++)
+        {
+            objects[i] = proposals[picked[i]];
+            // adjust offset to original unpadded
+            float x0 = (objects[i].rect.x) / scale;
+            float y0 = (objects[i].rect.y) / scale;
+            float x1 = (objects[i].rect.x + objects[i].rect.width) / scale;
+            float y1 = (objects[i].rect.y + objects[i].rect.height) / scale;
+            // clip
+            x0 = std::max(std::min(x0, (float)(img_w - 1)), 0.f);
+            y0 = std::max(std::min(y0, (float)(img_h - 1)), 0.f);
+            x1 = std::max(std::min(x1, (float)(img_w - 1)), 0.f);
+            y1 = std::max(std::min(y1, (float)(img_h - 1)), 0.f);
+            objects[i].rect.x = x0;
+            objects[i].rect.y = y0;
+            objects[i].rect.width = x1 - x0;
+            objects[i].rect.height = y1 - y0;
+        }
+}
+const float color_list[80][3] =
+{
+    {0.000, 0.447, 0.741},
+    {0.850, 0.325, 0.098},
+    {0.929, 0.694, 0.125},
+    {0.494, 0.184, 0.556},
+    {0.466, 0.674, 0.188},
+    {0.301, 0.745, 0.933},
+    {0.635, 0.078, 0.184},
+    {0.300, 0.300, 0.300},
+    {0.600, 0.600, 0.600},
+    {1.000, 0.000, 0.000},
+    {1.000, 0.500, 0.000},
+    {0.749, 0.749, 0.000},
+    {0.000, 1.000, 0.000},
+    {0.000, 0.000, 1.000},
+    {0.667, 0.000, 1.000},
+    {0.333, 0.333, 0.000},
+    {0.333, 0.667, 0.000},
+    {0.333, 1.000, 0.000},
+    {0.667, 0.333, 0.000},
+    {0.667, 0.667, 0.000},
+    {0.667, 1.000, 0.000},
+    {1.000, 0.333, 0.000},
+    {1.000, 0.667, 0.000},
+    {1.000, 1.000, 0.000},
+    {0.000, 0.333, 0.500},
+    {0.000, 0.667, 0.500},
+    {0.000, 1.000, 0.500},
+    {0.333, 0.000, 0.500},
+    {0.333, 0.333, 0.500},
+    {0.333, 0.667, 0.500},
+    {0.333, 1.000, 0.500},
+    {0.667, 0.000, 0.500},
+    {0.667, 0.333, 0.500},
+    {0.667, 0.667, 0.500},
+    {0.667, 1.000, 0.500},
+    {1.000, 0.000, 0.500},
+    {1.000, 0.333, 0.500},
+    {1.000, 0.667, 0.500},
+    {1.000, 1.000, 0.500},
+    {0.000, 0.333, 1.000},
+    {0.000, 0.667, 1.000},
+    {0.000, 1.000, 1.000},
+    {0.333, 0.000, 1.000},
+    {0.333, 0.333, 1.000},
+    {0.333, 0.667, 1.000},
+    {0.333, 1.000, 1.000},
+    {0.667, 0.000, 1.000},
+    {0.667, 0.333, 1.000},
+    {0.667, 0.667, 1.000},
+    {0.667, 1.000, 1.000},
+    {1.000, 0.000, 1.000},
+    {1.000, 0.333, 1.000},
+    {1.000, 0.667, 1.000},
+    {0.333, 0.000, 0.000},
+    {0.500, 0.000, 0.000},
+    {0.667, 0.000, 0.000},
+    {0.833, 0.000, 0.000},
+    {1.000, 0.000, 0.000},
+    {0.000, 0.167, 0.000},
+    {0.000, 0.333, 0.000},
+    {0.000, 0.500, 0.000},
+    {0.000, 0.667, 0.000},
+    {0.000, 0.833, 0.000},
+    {0.000, 1.000, 0.000},
+    {0.000, 0.000, 0.167},
+    {0.000, 0.000, 0.333},
+    {0.000, 0.000, 0.500},
+    {0.000, 0.000, 0.667},
+    {0.000, 0.000, 0.833},
+    {0.000, 0.000, 1.000},
+    {0.000, 0.000, 0.000},
+    {0.143, 0.143, 0.143},
+    {0.286, 0.286, 0.286},
+    {0.429, 0.429, 0.429},
+    {0.571, 0.571, 0.571},
+    {0.714, 0.714, 0.714},
+    {0.857, 0.857, 0.857},
+    {0.000, 0.447, 0.741},
+    {0.314, 0.717, 0.741},
+    {0.50, 0.5, 0}
+};
+static void draw_objects(const cv::Mat& bgr, const std::vector<Object>& objects)
+{
+    static const char* class_names[] = {
+        "person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light",
+        "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow",
+        "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee",
+        "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard",
+        "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple",
+        "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch",
+        "potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone",
+        "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear",
+        "hair drier", "toothbrush"
+    };
+    cv::Mat image = bgr.clone();
+    for (size_t i = 0; i < objects.size(); i++)
+    {
+        const Object& obj = objects[i];
+        fprintf(stderr, "%d = %.5f at %.2f %.2f %.2f x %.2f\n", obj.label, obj.prob,
+                obj.rect.x, obj.rect.y, obj.rect.width, obj.rect.height);
+        cv::Scalar color = cv::Scalar(color_list[obj.label][0], color_list[obj.label][1], color_list[obj.label][2]);
+        float c_mean = cv::mean(color)[0];
+        cv::Scalar txt_color;
+        if (c_mean > 0.5){
+            txt_color = cv::Scalar(0, 0, 0);
+        }else{
+            txt_color = cv::Scalar(255, 255, 255);
+        }
+        cv::rectangle(image, obj.rect, color * 255, 2);
+        char text[256];
+        sprintf(text, "%s %.1f%%", class_names[obj.label], obj.prob * 100);
+        int baseLine = 0;
+        cv::Size label_size = cv::getTextSize(text, cv::FONT_HERSHEY_COMPLEX, 0.4, 1, &baseLine);
+        cv::Scalar txt_bk_color = color * 0.7 * 255;
+        int x = obj.rect.x;
+        int y = obj.rect.y + 1;
+        //int y = obj.rect.y - label_size.height - baseLine;
+        if (y > image.rows)
+            y = image.rows;
+        //if (x + label_size.width > image.cols)
+            //x = image.cols - label_size.width;
+        cv::rectangle(image, cv::Rect(cv::Point(x, y), cv::Size(label_size.width, label_size.height + baseLine)),
+                      txt_bk_color, -1);
+        cv::putText(image, text, cv::Point(x, y + label_size.height),
+                    cv::FONT_HERSHEY_COMPLEX, 0.4, txt_color, 1);
+    }
+    cv::imwrite("_demo.jpg" , image);
+    fprintf(stderr, "save vis file\n");
+    /* cv::imshow("image", image); */
+    /* cv::waitKey(0); */
+}
+int main(int argc, char* argv[]) {
+    try {
+        // ------------------------------ Parsing and validation of input arguments
+        // ---------------------------------
+        if (argc != 4) {
+            tcout << "Usage : " << argv[0] << " <path_to_model> <path_to_image> <device_name>" << std::endl;
+            return EXIT_FAILURE;
+        }
+        const file_name_t input_model {argv[1]};
+        const file_name_t input_image_path {argv[2]};
+        const std::string device_name {argv[3]};
+        // -----------------------------------------------------------------------------------------------------
+        // --------------------------- Step 1. Initialize inference engine core
+        // -------------------------------------
+        Core ie;
+        // -----------------------------------------------------------------------------------------------------
+        // Step 2. Read a model in OpenVINO Intermediate Representation (.xml and
+        // .bin files) or ONNX (.onnx file) format
+        CNNNetwork network = ie.ReadNetwork(input_model);
+        if (network.getOutputsInfo().size() != 1)
+            throw std::logic_error("Sample supports topologies with 1 output only");
+        if (network.getInputsInfo().size() != 1)
+            throw std::logic_error("Sample supports topologies with 1 input only");
+        // -----------------------------------------------------------------------------------------------------
+        // --------------------------- Step 3. Configure input & output
+        // ---------------------------------------------
+        // --------------------------- Prepare input blobs
+        // -----------------------------------------------------
+        InputInfo::Ptr input_info = network.getInputsInfo().begin()->second;
+        std::string input_name = network.getInputsInfo().begin()->first;
+        /* Mark input as resizable by setting of a resize algorithm.
+         * In this case we will be able to set an input blob of any shape to an
+         * infer request. Resize and layout conversions are executed automatically
+         * during inference */
+        //input_info->getPreProcess().setResizeAlgorithm(RESIZE_BILINEAR);
+        //input_info->setLayout(Layout::NHWC);
+        //input_info->setPrecision(Precision::FP32);
+        // --------------------------- Prepare output blobs
+        // ----------------------------------------------------
+        if (network.getOutputsInfo().empty()) {
+            std::cerr << "Network outputs info is empty" << std::endl;
+            return EXIT_FAILURE;
+        }
+        DataPtr output_info = network.getOutputsInfo().begin()->second;
+        std::string output_name = network.getOutputsInfo().begin()->first;
+        output_info->setPrecision(Precision::FP32);
+        // -----------------------------------------------------------------------------------------------------
+        // --------------------------- Step 4. Loading a model to the device
+        // ------------------------------------------
+        ExecutableNetwork executable_network = ie.LoadNetwork(network, device_name);
+        // -----------------------------------------------------------------------------------------------------
+        // --------------------------- Step 5. Create an infer request
+        // -------------------------------------------------
+        InferRequest infer_request = executable_network.CreateInferRequest();
+        // -----------------------------------------------------------------------------------------------------
+        // --------------------------- Step 6. Prepare input
+        // --------------------------------------------------------
+        /* Read input image to a blob and set it to an infer request without resize
+         * and layout conversions. */
+        cv::Mat image = imread_t(input_image_path);
+	    cv::Mat pr_img = static_resize(image);
+        Blob::Ptr imgBlob = infer_request.GetBlob(input_name);     // just wrap Mat data by Blob::Ptr
+	    blobFromImage(pr_img, imgBlob);
+        // infer_request.SetBlob(input_name, imgBlob);  // infer_request accepts input blob of any size
+        // -----------------------------------------------------------------------------------------------------
+        // --------------------------- Step 7. Do inference
+        // --------------------------------------------------------
+        /* Running the request synchronously */
+        infer_request.Infer();
+        // -----------------------------------------------------------------------------------------------------
+        // --------------------------- Step 8. Process output
+        // ------------------------------------------------------
+        const Blob::Ptr output_blob = infer_request.GetBlob(output_name);
+        MemoryBlob::CPtr moutput = as<MemoryBlob>(output_blob);
+        if (!moutput) {
+            throw std::logic_error("We expect output to be inherited from MemoryBlob, "
+                                   "but by fact we were not able to cast output to MemoryBlob");
+        }
+        // locked memory holder should be alive all time while access to its buffer
+        // happens
+        auto moutputHolder = moutput->rmap();
+        const float* net_pred = moutputHolder.as<const PrecisionTrait<Precision::FP32>::value_type*>();
+        const int image_size = 416;
+	    int img_w = image.cols;
+        int img_h = image.rows;
+	    float scale = std::min(INPUT_W / (image.cols*1.0), INPUT_H / (image.rows*1.0));
+        std::vector<Object> objects;
+        decode_outputs(net_pred, objects, scale, img_w, img_h);
+        draw_objects(image, objects);
+            // -----------------------------------------------------------------------------------------------------
+        } catch (const std::exception& ex) {
+            std::cerr << ex.what() << std::endl;
+            return EXIT_FAILURE;
+    }
+    return EXIT_SUCCESS;
+}

demo/OpenVINO/python/README.md ADDED Viewed

	@@ -0,0 +1,88 @@

+# User Guide for Deploy YOLOX on OpenVINO
+This toturial includes a Python demo for OpenVINO, as well as some converted models.
+### Download OpenVINO models.
+| Model | Parameters | GFLOPs | Test Size | mAP |
+|:------| :----: | :----: | :---: | :---: |
+|  [YOLOX-Nano](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.res101.fpn.coco.800size.1x) |  0.91M  | 1.08 | 416x416 | 25.3 |
+|  [YOLOX-Tiny](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.fpn.coco.800size.1x) | 5.06M     | 6.45 | 416x416 |31.7 |
+|  [YOLOX-S](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 9.0M | 26.8 | 640x640 |39.6 |
+|  [YOLOX-M](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 25.3M | 73.8 | 640x640 |46.4 |
+|  [YOLOX-L](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 54.2M | 155.6 | 640x640 |50.0 |
+|  [YOLOX-X](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 99.1M | 281.9 | 640x640 |51.2 |
+|  [YOLOX-Darknet53](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 63.72M | 185.3 | 640x640 |47.3 |
+## Install OpenVINO Toolkit
+Please visit [Openvino Homepage](https://docs.openvinotoolkit.org/latest/get_started_guides.html) for more details.
+## Set up the Environment
+### For Linux
+**Option1. Set up the environment tempororally. You need to run this command everytime you start a new shell window.**
+```shell
+source /opt/intel/openvino_2021/bin/setupvars.sh
+```
+**Option2. Set up the environment permenantly.**
+*Step1.* For Linux:
+```shell
+vim ~/.bashrc
+```
+*Step2.* Add the following line into your file:
+```shell
+source /opt/intel/openvino_2021/bin/setupvars.sh
+```
+*Step3.* Save and exit the file, then run:
+```shell
+source ~/.bashrc
+```
+## Convert model
+1. Export ONNX model
+   Please refer to the [ONNX toturial]() for more details. **Note that you should set --opset to 10, otherwise your next step will fail.**
+2. Convert ONNX to OpenVINO
+   ``` shell
+   cd <INSTSLL_DIR>/openvino_2021/deployment_tools/model_optimizer
+   ```
+   Install requirements for convert tool
+   ```shell
+   sudo ./install_prerequisites/install_prerequisites_onnx.sh
+   ```
+   Then convert model.
+   ```shell
+   python3 mo.py --input_model <ONNX_MODEL> --input_shape <INPUT_SHAPE> [--data_type FP16]
+   ```
+   For example:
+   ```shell
+   python3 mo.py --input_model yolox.onnx --input_shape (1,3,640,640) --data_type FP16
+   ```
+## Demo
+### python
+```shell
+python openvino_inference.py -m <XML_MODEL_PATH> -i <IMAGE_PATH>
+```
+or
+```shell
+python openvino_inference.py -m <XML_MODEL_PATH> -i <IMAGE_PATH> -o <OUTPUT_DIR> -s <SCORE_THR> -d <DEVICE>
+```

demo/OpenVINO/python/demo_utils.py ADDED Viewed

	@@ -0,0 +1,86 @@

+import numpy as np
+import os
+def mkdir(path):
+    if not os.path.exists(path):
+        os.makedirs(path)
+def nms(boxes, scores, nms_thr):
+    """Single class NMS implemented in Numpy."""
+    x1 = boxes[:, 0]
+    y1 = boxes[:, 1]
+    x2 = boxes[:, 2]
+    y2 = boxes[:, 3]
+    areas = (x2 - x1 + 1) * (y2 - y1 + 1)
+    order = scores.argsort()[::-1]
+    keep = []
+    while order.size > 0:
+        i = order[0]
+        keep.append(i)
+        xx1 = np.maximum(x1[i], x1[order[1:]])
+        yy1 = np.maximum(y1[i], y1[order[1:]])
+        xx2 = np.minimum(x2[i], x2[order[1:]])
+        yy2 = np.minimum(y2[i], y2[order[1:]])
+        w = np.maximum(0.0, xx2 - xx1 + 1)
+        h = np.maximum(0.0, yy2 - yy1 + 1)
+        inter = w * h
+        ovr = inter / (areas[i] + areas[order[1:]] - inter)
+        inds = np.where(ovr <= nms_thr)[0]
+        order = order[inds + 1]
+    return keep
+def multiclass_nms(boxes, scores, nms_thr, score_thr):
+    """Multiclass NMS implemented in Numpy"""
+    final_dets = []
+    num_classes = scores.shape[1]
+    for cls_ind in range(num_classes):
+        cls_scores = scores[:, cls_ind]
+        valid_score_mask = cls_scores > score_thr
+        if valid_score_mask.sum() == 0:
+            continue
+        else:
+            valid_scores = cls_scores[valid_score_mask]
+            valid_boxes = boxes[valid_score_mask]
+            keep = nms(valid_boxes, valid_scores, nms_thr)
+            if len(keep) > 0:
+                cls_inds = np.ones((len(keep), 1)) * cls_ind
+                dets = np.concatenate([valid_boxes[keep], valid_scores[keep, None], cls_inds], 1)
+                final_dets.append(dets)
+    return np.concatenate(final_dets, 0)
+def postprocess(outputs, img_size, p6=False):
+    grids = []
+    expanded_strides = []
+    if not p6:
+        strides = [8, 16, 32]
+    else:
+        strides = [8, 16, 32, 64]
+    hsizes = [img_size[0]//stride for stride in strides]
+    wsizes = [img_size[1]//stride for stride in strides]
+    for hsize, wsize, stride in zip(hsizes, wsizes, strides):
+        xv, yv = np.meshgrid(np.arange(hsize), np.arange(wsize))
+        grid = np.stack((xv, yv), 2).reshape(1, -1, 2)
+        grids.append(grid)
+        shape = grid.shape[:2]
+        expanded_strides.append(np.full((*shape, 1), stride))
+    grids = np.concatenate(grids, 1)
+    expanded_strides = np.concatenate(expanded_strides, 1)
+    outputs[..., :2] = (outputs[..., :2] + grids) * expanded_strides
+    outputs[..., 2:4] = np.exp(outputs[..., 2:4]) * expanded_strides
+    return outputs

demo/OpenVINO/python/openvino_inference.py ADDED Viewed

	@@ -0,0 +1,155 @@

+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+# Copyright (C) 2018-2021 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+import argparse
+import logging as log
+import os
+import sys
+import cv2
+import numpy as np
+from demo_utils import mkdir, multiclass_nms, postprocess
+from openvino.inference_engine import IECore
+from yolox.data.data_augment import preproc as preprocess
+from yolox.data.datasets import COCO_CLASSES
+from yolox.utils.visualize import vis
+def parse_args() -> argparse.Namespace:
+    """Parse and return command line arguments"""
+    parser = argparse.ArgumentParser(add_help=False)
+    args = parser.add_argument_group('Options')
+    args.add_argument(
+        '-h',
+        '--help',
+        action='help',
+        help='Show this help message and exit.')
+    args.add_argument(
+        '-m',
+        '--model',
+        required=True,
+        type=str,
+        help='Required. Path to an .xml or .onnx file with a trained model.')
+    args.add_argument(
+        '-i',
+        '--input',
+        required=True,
+        type=str,
+        help='Required. Path to an image file.')
+    args.add_argument(
+        '-o',
+        '--output_dir',
+        type=str,
+        default='demo_output',
+        help='Path to your output dir.')
+    args.add_argument(
+        '-s',
+        '--score_thr',
+        type=float,
+        default=0.3,
+        help="Score threshould to visualize the result.")
+    args.add_argument(
+        '-d',
+        '--device',
+        default='CPU',
+        type=str,
+        help='Optional. Specify the target device to infer on; CPU, GPU, \
+              MYRIAD, HDDL or HETERO: is acceptable. The sample will look \
+              for a suitable plugin for device specified. Default value \
+              is CPU.')
+    args.add_argument(
+        '--labels',
+        default=None,
+        type=str,
+        help='Option:al. Path to a labels mapping file.')
+    args.add_argument(
+        '-nt',
+        '--number_top',
+        default=10,
+        type=int,
+        help='Optional. Number of top results.')
+    return parser.parse_args()
+def main():
+    log.basicConfig(format='[ %(levelname)s ] %(message)s', level=log.INFO, stream=sys.stdout)
+    args = parse_args()
+    # ---------------------------Step 1. Initialize inference engine core--------------------------------------------------
+    log.info('Creating Inference Engine')
+    ie = IECore()
+    # ---------------------------Step 2. Read a model in OpenVINO Intermediate Representation or ONNX format---------------
+    log.info(f'Reading the network: {args.model}')
+    # (.xml and .bin files) or (.onnx file)
+    net = ie.read_network(model=args.model)
+    if len(net.input_info) != 1:
+        log.error('Sample supports only single input topologies')
+        return -1
+    if len(net.outputs) != 1:
+        log.error('Sample supports only single output topologies')
+        return -1
+    # ---------------------------Step 3. Configure input & output----------------------------------------------------------
+    log.info('Configuring input and output blobs')
+    # Get names of input and output blobs
+    input_blob = next(iter(net.input_info))
+    out_blob = next(iter(net.outputs))
+    # Set input and output precision manually
+    net.input_info[input_blob].precision = 'FP32'
+    net.outputs[out_blob].precision = 'FP16'
+    # Get a number of classes recognized by a model
+    num_of_classes = max(net.outputs[out_blob].shape)
+    # ---------------------------Step 4. Loading model to the device-------------------------------------------------------
+    log.info('Loading the model to the plugin')
+    exec_net = ie.load_network(network=net, device_name=args.device)
+    # ---------------------------Step 5. Create infer request--------------------------------------------------------------
+    # load_network() method of the IECore class with a specified number of requests (default 1) returns an ExecutableNetwork
+    # instance which stores infer requests. So you already created Infer requests in the previous step.
+    # ---------------------------Step 6. Prepare input---------------------------------------------------------------------
+    origin_img = cv2.imread(args.input)
+    _, _, h, w = net.input_info[input_blob].input_data.shape
+    mean = (0.485, 0.456, 0.406)
+    std = (0.229, 0.224, 0.225)
+    image, ratio = preprocess(origin_img, (h, w), mean, std)
+    # ---------------------------Step 7. Do inference----------------------------------------------------------------------
+    log.info('Starting inference in synchronous mode')
+    res = exec_net.infer(inputs={input_blob: image})
+    # ---------------------------Step 8. Process output--------------------------------------------------------------------
+    res = res[out_blob]
+    predictions = postprocess(res, (h, w), p6=False)[0]
+    boxes = predictions[:, :4]
+    scores = predictions[:, 4, None] * predictions[:, 5:]
+    boxes_xyxy = np.ones_like(boxes)
+    boxes_xyxy[:, 0] = boxes[:, 0] - boxes[:, 2]/2.
+    boxes_xyxy[:, 1] = boxes[:, 1] - boxes[:, 3]/2.
+    boxes_xyxy[:, 2] = boxes[:, 0] + boxes[:, 2]/2.
+    boxes_xyxy[:, 3] = boxes[:, 1] + boxes[:, 3]/2.
+    boxes_xyxy /= ratio
+    dets = multiclass_nms(boxes_xyxy, scores, nms_thr=0.65, score_thr=0.1)
+    final_boxes = dets[:, :4]
+    final_scores, final_cls_inds = dets[:, 4], dets[:, 5]
+    origin_img = vis(origin_img, final_boxes, final_scores, final_cls_inds,
+                     conf=args.score_thr, class_names=COCO_CLASSES)
+    mkdir(args.output_dir)
+    output_path = os.path.join(args.output_dir, args.image_path.split("/")[-1])
+    cv2.imwrite(output_path, origin_img)
+if __name__ == '__main__':
+    sys.exit(main())

demo/TensorRT/cpp/CMakeLists.txt ADDED Viewed

	@@ -0,0 +1,36 @@

+cmake_minimum_required(VERSION 2.6)
+project(yolox)
+add_definitions(-std=c++11)
+option(CUDA_USE_STATIC_CUDA_RUNTIME OFF)
+set(CMAKE_CXX_STANDARD 11)
+set(CMAKE_BUILD_TYPE Debug)
+find_package(CUDA REQUIRED)
+include_directories(${PROJECT_SOURCE_DIR}/include)
+# include and link dirs of cuda and tensorrt, you need adapt them if yours are different
+# cuda
+include_directories(/data/cuda/cuda-10.2/cuda/include)
+link_directories(/data/cuda/cuda-10.2/cuda/lib64)
+# cudnn
+include_directories(/data/cuda/cuda-10.2/cudnn/v8.0.4/include)
+link_directories(/data/cuda/cuda-10.2/cudnn/v8.0.4/lib64)
+# tensorrt
+include_directories(/data/cuda/cuda-10.2/TensorRT/v7.2.1.6/include)
+link_directories(/data/cuda/cuda-10.2/TensorRT/v7.2.1.6/lib)
+set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11 -Wall -Ofast -Wfatal-errors -D_MWAITXINTRIN_H_INCLUDED")
+find_package(OpenCV)
+include_directories(${OpenCV_INCLUDE_DIRS})
+add_executable(yolox ${PROJECT_SOURCE_DIR}/yolox.cpp)
+target_link_libraries(yolox nvinfer)
+target_link_libraries(yolox cudart)
+target_link_libraries(yolox ${OpenCV_LIBS})
+add_definitions(-O2 -pthread)

demo/TensorRT/cpp/README.md ADDED Viewed

	@@ -0,0 +1,43 @@

+# User Guide for Deploy YOLOX on TensorRT C++
+As YOLOX models is easy to converted to tensorrt using [torch2trt gitrepo](https://github.com/NVIDIA-AI-IOT/torch2trt),
+our C++ demo will not include the model converting or constructing like other tenorrt demos.
+## Step 1: Prepare serialized engine file
+Follow the trt [python demo README](../Python/README.md) to convert and save the serialized engine file.
+## Step 2: build the demo
+Please follow the [TensorRT Installation Guide](https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html) to install TensorRT.
+Install opencv with ```sudo apt-get install libopencv-dev```.
+build the demo:
+```shell
+mkdir build
+cd build
+cmake ..
+make
+```
+Move the 'model_trt.engine' file generated from Step 1 (saved at the exp output dir) to the build dir:
+```shell
+mv /path/to/your/exp/output/dir/model_trt.engine .
+```
+Then run the demo:
+```shell
+./yolox -d /your/path/to/yolox/assets
+```
+or
+```shell
+./yolox -d <img dir>
+```

demo/TensorRT/cpp/logging.h ADDED Viewed

	@@ -0,0 +1,503 @@

+/*
+ * Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+#ifndef TENSORRT_LOGGING_H
+#define TENSORRT_LOGGING_H
+#include "NvInferRuntimeCommon.h"
+#include <cassert>
+#include <ctime>
+#include <iomanip>
+#include <iostream>
+#include <ostream>
+#include <sstream>
+#include <string>
+using Severity = nvinfer1::ILogger::Severity;
+class LogStreamConsumerBuffer : public std::stringbuf
+{
+public:
+    LogStreamConsumerBuffer(std::ostream& stream, const std::string& prefix, bool shouldLog)
+        : mOutput(stream)
+        , mPrefix(prefix)
+        , mShouldLog(shouldLog)
+    {
+    }
+    LogStreamConsumerBuffer(LogStreamConsumerBuffer&& other)
+        : mOutput(other.mOutput)
+    {
+    }
+    ~LogStreamConsumerBuffer()
+    {
+        // std::streambuf::pbase() gives a pointer to the beginning of the buffered part of the output sequence
+        // std::streambuf::pptr() gives a pointer to the current position of the output sequence
+        // if the pointer to the beginning is not equal to the pointer to the current position,
+        // call putOutput() to log the output to the stream
+        if (pbase() != pptr())
+        {
+            putOutput();
+        }
+    }
+    // synchronizes the stream buffer and returns 0 on success
+    // synchronizing the stream buffer consists of inserting the buffer contents into the stream,
+    // resetting the buffer and flushing the stream
+    virtual int sync()
+    {
+        putOutput();
+        return 0;
+    }
+    void putOutput()
+    {
+        if (mShouldLog)
+        {
+            // prepend timestamp
+            std::time_t timestamp = std::time(nullptr);
+            tm* tm_local = std::localtime(&timestamp);
+            std::cout << "[";
+            std::cout << std::setw(2) << std::setfill('0') << 1 + tm_local->tm_mon << "/";
+            std::cout << std::setw(2) << std::setfill('0') << tm_local->tm_mday << "/";
+            std::cout << std::setw(4) << std::setfill('0') << 1900 + tm_local->tm_year << "-";
+            std::cout << std::setw(2) << std::setfill('0') << tm_local->tm_hour << ":";
+            std::cout << std::setw(2) << std::setfill('0') << tm_local->tm_min << ":";
+            std::cout << std::setw(2) << std::setfill('0') << tm_local->tm_sec << "] ";
+            // std::stringbuf::str() gets the string contents of the buffer
+            // insert the buffer contents pre-appended by the appropriate prefix into the stream
+            mOutput << mPrefix << str();
+            // set the buffer to empty
+            str("");
+            // flush the stream
+            mOutput.flush();
+        }
+    }
+    void setShouldLog(bool shouldLog)
+    {
+        mShouldLog = shouldLog;
+    }
+private:
+    std::ostream& mOutput;
+    std::string mPrefix;
+    bool mShouldLog;
+};
+//!
+//! \class LogStreamConsumerBase
+//! \brief Convenience object used to initialize LogStreamConsumerBuffer before std::ostream in LogStreamConsumer
+//!
+class LogStreamConsumerBase
+{
+public:
+    LogStreamConsumerBase(std::ostream& stream, const std::string& prefix, bool shouldLog)
+        : mBuffer(stream, prefix, shouldLog)
+    {
+    }
+protected:
+    LogStreamConsumerBuffer mBuffer;
+};
+//!
+//! \class LogStreamConsumer
+//! \brief Convenience object used to facilitate use of C++ stream syntax when logging messages.
+//!  Order of base classes is LogStreamConsumerBase and then std::ostream.
+//!  This is because the LogStreamConsumerBase class is used to initialize the LogStreamConsumerBuffer member field
+//!  in LogStreamConsumer and then the address of the buffer is passed to std::ostream.
+//!  This is necessary to prevent the address of an uninitialized buffer from being passed to std::ostream.
+//!  Please do not change the order of the parent classes.
+//!
+class LogStreamConsumer : protected LogStreamConsumerBase, public std::ostream
+{
+public:
+    //! \brief Creates a LogStreamConsumer which logs messages with level severity.
+    //!  Reportable severity determines if the messages are severe enough to be logged.
+    LogStreamConsumer(Severity reportableSeverity, Severity severity)
+        : LogStreamConsumerBase(severityOstream(severity), severityPrefix(severity), severity <= reportableSeverity)
+        , std::ostream(&mBuffer) // links the stream buffer with the stream
+        , mShouldLog(severity <= reportableSeverity)
+        , mSeverity(severity)
+    {
+    }
+    LogStreamConsumer(LogStreamConsumer&& other)
+        : LogStreamConsumerBase(severityOstream(other.mSeverity), severityPrefix(other.mSeverity), other.mShouldLog)
+        , std::ostream(&mBuffer) // links the stream buffer with the stream
+        , mShouldLog(other.mShouldLog)
+        , mSeverity(other.mSeverity)
+    {
+    }
+    void setReportableSeverity(Severity reportableSeverity)
+    {
+        mShouldLog = mSeverity <= reportableSeverity;
+        mBuffer.setShouldLog(mShouldLog);
+    }
+private:
+    static std::ostream& severityOstream(Severity severity)
+    {
+        return severity >= Severity::kINFO ? std::cout : std::cerr;
+    }
+    static std::string severityPrefix(Severity severity)
+    {
+        switch (severity)
+        {
+        case Severity::kINTERNAL_ERROR: return "[F] ";
+        case Severity::kERROR: return "[E] ";
+        case Severity::kWARNING: return "[W] ";
+        case Severity::kINFO: return "[I] ";
+        case Severity::kVERBOSE: return "[V] ";
+        default: assert(0); return "";
+        }
+    }
+    bool mShouldLog;
+    Severity mSeverity;
+};
+//! \class Logger
+//!
+//! \brief Class which manages logging of TensorRT tools and samples
+//!
+//! \details This class provides a common interface for TensorRT tools and samples to log information to the console,
+//! and supports logging two types of messages:
+//!
+//! - Debugging messages with an associated severity (info, warning, error, or internal error/fatal)
+//! - Test pass/fail messages
+//!
+//! The advantage of having all samples use this class for logging as opposed to emitting directly to stdout/stderr is
+//! that the logic for controlling the verbosity and formatting of sample output is centralized in one location.
+//!
+//! In the future, this class could be extended to support dumping test results to a file in some standard format
+//! (for example, JUnit XML), and providing additional metadata (e.g. timing the duration of a test run).
+//!
+//! TODO: For backwards compatibility with existing samples, this class inherits directly from the nvinfer1::ILogger
+//! interface, which is problematic since there isn't a clean separation between messages coming from the TensorRT
+//! library and messages coming from the sample.
+//!
+//! In the future (once all samples are updated to use Logger::getTRTLogger() to access the ILogger) we can refactor the
+//! class to eliminate the inheritance and instead make the nvinfer1::ILogger implementation a member of the Logger
+//! object.
+class Logger : public nvinfer1::ILogger
+{
+public:
+    Logger(Severity severity = Severity::kWARNING)
+        : mReportableSeverity(severity)
+    {
+    }
+    //!
+    //! \enum TestResult
+    //! \brief Represents the state of a given test
+    //!
+    enum class TestResult
+    {
+        kRUNNING, //!< The test is running
+        kPASSED,  //!< The test passed
+        kFAILED,  //!< The test failed
+        kWAIVED   //!< The test was waived
+    };
+    //!
+    //! \brief Forward-compatible method for retrieving the nvinfer::ILogger associated with this Logger
+    //! \return The nvinfer1::ILogger associated with this Logger
+    //!
+    //! TODO Once all samples are updated to use this method to register the logger with TensorRT,
+    //! we can eliminate the inheritance of Logger from ILogger
+    //!
+    nvinfer1::ILogger& getTRTLogger()
+    {
+        return *this;
+    }
+    //!
+    //! \brief Implementation of the nvinfer1::ILogger::log() virtual method
+    //!
+    //! Note samples should not be calling this function directly; it will eventually go away once we eliminate the
+    //! inheritance from nvinfer1::ILogger
+    //!
+    void log(Severity severity, const char* msg) override
+    {
+        LogStreamConsumer(mReportableSeverity, severity) << "[TRT] " << std::string(msg) << std::endl;
+    }
+    //!
+    //! \brief Method for controlling the verbosity of logging output
+    //!
+    //! \param severity The logger will only emit messages that have severity of this level or higher.
+    //!
+    void setReportableSeverity(Severity severity)
+    {
+        mReportableSeverity = severity;
+    }
+    //!
+    //! \brief Opaque handle that holds logging information for a particular test
+    //!
+    //! This object is an opaque handle to information used by the Logger to print test results.
+    //! The sample must call Logger::defineTest() in order to obtain a TestAtom that can be used
+    //! with Logger::reportTest{Start,End}().
+    //!
+    class TestAtom
+    {
+    public:
+        TestAtom(TestAtom&&) = default;
+    private:
+        friend class Logger;
+        TestAtom(bool started, const std::string& name, const std::string& cmdline)
+            : mStarted(started)
+            , mName(name)
+            , mCmdline(cmdline)
+        {
+        }
+        bool mStarted;
+        std::string mName;
+        std::string mCmdline;
+    };
+    //!
+    //! \brief Define a test for logging
+    //!
+    //! \param[in] name The name of the test.  This should be a string starting with
+    //!                  "TensorRT" and containing dot-separated strings containing
+    //!                  the characters [A-Za-z0-9_].
+    //!                  For example, "TensorRT.sample_googlenet"
+    //! \param[in] cmdline The command line used to reproduce the test
+    //
+    //! \return a TestAtom that can be used in Logger::reportTest{Start,End}().
+    //!
+    static TestAtom defineTest(const std::string& name, const std::string& cmdline)
+    {
+        return TestAtom(false, name, cmdline);
+    }
+    //!
+    //! \brief A convenience overloaded version of defineTest() that accepts an array of command-line arguments
+    //!        as input
+    //!
+    //! \param[in] name The name of the test
+    //! \param[in] argc The number of command-line arguments
+    //! \param[in] argv The array of command-line arguments (given as C strings)
+    //!
+    //! \return a TestAtom that can be used in Logger::reportTest{Start,End}().
+    static TestAtom defineTest(const std::string& name, int argc, char const* const* argv)
+    {
+        auto cmdline = genCmdlineString(argc, argv);
+        return defineTest(name, cmdline);
+    }
+    //!
+    //! \brief Report that a test has started.
+    //!
+    //! \pre reportTestStart() has not been called yet for the given testAtom
+    //!
+    //! \param[in] testAtom The handle to the test that has started
+    //!
+    static void reportTestStart(TestAtom& testAtom)
+    {
+        reportTestResult(testAtom, TestResult::kRUNNING);
+        assert(!testAtom.mStarted);
+        testAtom.mStarted = true;
+    }
+    //!
+    //! \brief Report that a test has ended.
+    //!
+    //! \pre reportTestStart() has been called for the given testAtom
+    //!
+    //! \param[in] testAtom The handle to the test that has ended
+    //! \param[in] result The result of the test. Should be one of TestResult::kPASSED,
+    //!                   TestResult::kFAILED, TestResult::kWAIVED
+    //!
+    static void reportTestEnd(const TestAtom& testAtom, TestResult result)
+    {
+        assert(result != TestResult::kRUNNING);
+        assert(testAtom.mStarted);
+        reportTestResult(testAtom, result);
+    }
+    static int reportPass(const TestAtom& testAtom)
+    {
+        reportTestEnd(testAtom, TestResult::kPASSED);
+        return EXIT_SUCCESS;
+    }
+    static int reportFail(const TestAtom& testAtom)
+    {
+        reportTestEnd(testAtom, TestResult::kFAILED);
+        return EXIT_FAILURE;
+    }
+    static int reportWaive(const TestAtom& testAtom)
+    {
+        reportTestEnd(testAtom, TestResult::kWAIVED);
+        return EXIT_SUCCESS;
+    }
+    static int reportTest(const TestAtom& testAtom, bool pass)
+    {
+        return pass ? reportPass(testAtom) : reportFail(testAtom);
+    }
+    Severity getReportableSeverity() const
+    {
+        return mReportableSeverity;
+    }
+private:
+    //!
+    //! \brief returns an appropriate string for prefixing a log message with the given severity
+    //!
+    static const char* severityPrefix(Severity severity)
+    {
+        switch (severity)
+        {
+        case Severity::kINTERNAL_ERROR: return "[F] ";
+        case Severity::kERROR: return "[E] ";
+        case Severity::kWARNING: return "[W] ";
+        case Severity::kINFO: return "[I] ";
+        case Severity::kVERBOSE: return "[V] ";
+        default: assert(0); return "";
+        }
+    }
+    //!
+    //! \brief returns an appropriate string for prefixing a test result message with the given result
+    //!
+    static const char* testResultString(TestResult result)
+    {
+        switch (result)
+        {
+        case TestResult::kRUNNING: return "RUNNING";
+        case TestResult::kPASSED: return "PASSED";
+        case TestResult::kFAILED: return "FAILED";
+        case TestResult::kWAIVED: return "WAIVED";
+        default: assert(0); return "";
+        }
+    }
+    //!
+    //! \brief returns an appropriate output stream (cout or cerr) to use with the given severity
+    //!
+    static std::ostream& severityOstream(Severity severity)
+    {
+        return severity >= Severity::kINFO ? std::cout : std::cerr;
+    }
+    //!
+    //! \brief method that implements logging test results
+    //!
+    static void reportTestResult(const TestAtom& testAtom, TestResult result)
+    {
+        severityOstream(Severity::kINFO) << "&&&& " << testResultString(result) << " " << testAtom.mName << " # "
+                                         << testAtom.mCmdline << std::endl;
+    }
+    //!
+    //! \brief generate a command line string from the given (argc, argv) values
+    //!
+    static std::string genCmdlineString(int argc, char const* const* argv)
+    {
+        std::stringstream ss;
+        for (int i = 0; i < argc; i++)
+        {
+            if (i > 0)
+                ss << " ";
+            ss << argv[i];
+        }
+        return ss.str();
+    }
+    Severity mReportableSeverity;
+};
+namespace
+{
+//!
+//! \brief produces a LogStreamConsumer object that can be used to log messages of severity kVERBOSE
+//!
+//! Example usage:
+//!
+//!     LOG_VERBOSE(logger) << "hello world" << std::endl;
+//!
+inline LogStreamConsumer LOG_VERBOSE(const Logger& logger)
+{
+    return LogStreamConsumer(logger.getReportableSeverity(), Severity::kVERBOSE);
+}
+//!
+//! \brief produces a LogStreamConsumer object that can be used to log messages of severity kINFO
+//!
+//! Example usage:
+//!
+//!     LOG_INFO(logger) << "hello world" << std::endl;
+//!
+inline LogStreamConsumer LOG_INFO(const Logger& logger)
+{
+    return LogStreamConsumer(logger.getReportableSeverity(), Severity::kINFO);
+}
+//!
+//! \brief produces a LogStreamConsumer object that can be used to log messages of severity kWARNING
+//!
+//! Example usage:
+//!
+//!     LOG_WARN(logger) << "hello world" << std::endl;
+//!
+inline LogStreamConsumer LOG_WARN(const Logger& logger)
+{
+    return LogStreamConsumer(logger.getReportableSeverity(), Severity::kWARNING);
+}
+//!
+//! \brief produces a LogStreamConsumer object that can be used to log messages of severity kERROR
+//!
+//! Example usage:
+//!
+//!     LOG_ERROR(logger) << "hello world" << std::endl;
+//!
+inline LogStreamConsumer LOG_ERROR(const Logger& logger)
+{
+    return LogStreamConsumer(logger.getReportableSeverity(), Severity::kERROR);
+}
+//!
+//! \brief produces a LogStreamConsumer object that can be used to log messages of severity kINTERNAL_ERROR
+//         ("fatal" severity)
+//!
+//! Example usage:
+//!
+//!     LOG_FATAL(logger) << "hello world" << std::endl;
+//!
+inline LogStreamConsumer LOG_FATAL(const Logger& logger)
+{
+    return LogStreamConsumer(logger.getReportableSeverity(), Severity::kINTERNAL_ERROR);
+}
+} // anonymous namespace
+#endif // TENSORRT_LOGGING_H

demo/TensorRT/cpp/yolox.cpp ADDED Viewed

	@@ -0,0 +1,554 @@

+#include <fstream>
+#include <iostream>
+#include <sstream>
+#include <numeric>
+#include <chrono>
+#include <vector>
+#include <opencv2/opencv.hpp>
+#include <dirent.h>
+#include "NvInfer.h"
+#include "cuda_runtime_api.h"
+#include "logging.h"
+#define CHECK(status) \
+    do\
+    {\
+        auto ret = (status);\
+        if (ret != 0)\
+        {\
+            std::cerr << "Cuda failure: " << ret << std::endl;\
+            abort();\
+        }\
+    } while (0)
+#define DEVICE 0  // GPU id
+#define NMS_THRESH 0.65
+#define BBOX_CONF_THRESH 0.3
+using namespace nvinfer1;
+// stuff we know about the network and the input/output blobs
+static const int INPUT_W = 640;
+static const int INPUT_H = 640;
+const char* INPUT_BLOB_NAME = "input_0";
+const char* OUTPUT_BLOB_NAME = "output_0";
+static Logger gLogger;
+cv::Mat static_resize(cv::Mat& img) {
+    float r = std::min(INPUT_W / (img.cols*1.0), INPUT_H / (img.rows*1.0));
+    // r = std::min(r, 1.0f);
+    int unpad_w = r * img.cols;
+    int unpad_h = r * img.rows;
+    cv::Mat re(unpad_h, unpad_w, CV_8UC3);
+    cv::resize(img, re, re.size());
+    cv::Mat out(INPUT_W, INPUT_H, CV_8UC3, cv::Scalar(114, 114, 114));
+    re.copyTo(out(cv::Rect(0, 0, re.cols, re.rows)));
+    return out;
+}
+struct Object
+{
+    cv::Rect_<float> rect;
+    int label;
+    float prob;
+};
+struct GridAndStride
+{
+    int grid0;
+    int grid1;
+    int stride;
+};
+static int generate_grids_and_stride(const int target_size, std::vector<int>& strides, std::vector<GridAndStride>& grid_strides)
+{
+    for (auto stride : strides)
+    {
+        int num_grid = target_size / stride;
+        for (int g1 = 0; g1 < num_grid; g1++)
+        {
+            for (int g0 = 0; g0 < num_grid; g0++)
+            {
+                grid_strides.push_back((GridAndStride){g0, g1, stride});
+            }
+        }
+    }
+}
+static inline float intersection_area(const Object& a, const Object& b)
+{
+    cv::Rect_<float> inter = a.rect & b.rect;
+    return inter.area();
+}
+static void qsort_descent_inplace(std::vector<Object>& faceobjects, int left, int right)
+{
+    int i = left;
+    int j = right;
+    float p = faceobjects[(left + right) / 2].prob;
+    while (i <= j)
+    {
+        while (faceobjects[i].prob > p)
+            i++;
+        while (faceobjects[j].prob < p)
+            j--;
+        if (i <= j)
+        {
+            // swap
+            std::swap(faceobjects[i], faceobjects[j]);
+            i++;
+            j--;
+        }
+    }
+    #pragma omp parallel sections
+    {
+        #pragma omp section
+        {
+            if (left < j) qsort_descent_inplace(faceobjects, left, j);
+        }
+        #pragma omp section
+        {
+            if (i < right) qsort_descent_inplace(faceobjects, i, right);
+        }
+    }
+}
+static void qsort_descent_inplace(std::vector<Object>& objects)
+{
+    if (objects.empty())
+        return;
+    qsort_descent_inplace(objects, 0, objects.size() - 1);
+}
+static void nms_sorted_bboxes(const std::vector<Object>& faceobjects, std::vector<int>& picked, float nms_threshold)
+{
+    picked.clear();
+    const int n = faceobjects.size();
+    std::vector<float> areas(n);
+    for (int i = 0; i < n; i++)
+    {
+        areas[i] = faceobjects[i].rect.area();
+    }
+    for (int i = 0; i < n; i++)
+    {
+        const Object& a = faceobjects[i];
+        int keep = 1;
+        for (int j = 0; j < (int)picked.size(); j++)
+        {
+            const Object& b = faceobjects[picked[j]];
+            // intersection over union
+            float inter_area = intersection_area(a, b);
+            float union_area = areas[i] + areas[picked[j]] - inter_area;
+            // float IoU = inter_area / union_area
+            if (inter_area / union_area > nms_threshold)
+                keep = 0;
+        }
+        if (keep)
+            picked.push_back(i);
+    }
+}
+static void generate_yolox_proposals(std::vector<GridAndStride> grid_strides, float* feat_blob, float prob_threshold, std::vector<Object>& objects)
+{
+    const int num_class = 80;
+    const int num_anchors = grid_strides.size();
+    for (int anchor_idx = 0; anchor_idx < num_anchors; anchor_idx++)
+    {
+        const int grid0 = grid_strides[anchor_idx].grid0;
+        const int grid1 = grid_strides[anchor_idx].grid1;
+        const int stride = grid_strides[anchor_idx].stride;
+        const int basic_pos = anchor_idx * 85;
+        // yolox/models/yolo_head.py decode logic
+        float x_center = (feat_blob[basic_pos+0] + grid0) * stride;
+        float y_center = (feat_blob[basic_pos+1] + grid1) * stride;
+        float w = exp(feat_blob[basic_pos+2]) * stride;
+        float h = exp(feat_blob[basic_pos+3]) * stride;
+        float x0 = x_center - w * 0.5f;
+        float y0 = y_center - h * 0.5f;
+        float box_objectness = feat_blob[basic_pos+4];
+        for (int class_idx = 0; class_idx < num_class; class_idx++)
+        {
+            float box_cls_score = feat_blob[basic_pos + 5 + class_idx];
+            float box_prob = box_objectness * box_cls_score;
+            if (box_prob > prob_threshold)
+            {
+                Object obj;
+                obj.rect.x = x0;
+                obj.rect.y = y0;
+                obj.rect.width = w;
+                obj.rect.height = h;
+                obj.label = class_idx;
+                obj.prob = box_prob;
+                objects.push_back(obj);
+            }
+        } // class loop
+    } // point anchor loop
+}
+float* blobFromImage(cv::Mat& img){
+    cv::cvtColor(img, img, cv::COLOR_BGR2RGB);
+    float* blob = new float[img.total()*3];
+    int channels = 3;
+    int img_h = 640;
+    int img_w = 640;
+    std::vector<float> mean = {0.485, 0.456, 0.406};
+    std::vector<float> std = {0.229, 0.224, 0.225};
+    for (size_t c = 0; c < channels; c++)
+    {
+        for (size_t  h = 0; h < img_h; h++)
+        {
+            for (size_t w = 0; w < img_w; w++)
+            {
+                blob[c * img_w * img_h + h * img_w + w] =
+                    (((float)img.at<cv::Vec3b>(h, w)[c]) / 255.0f - mean[c]) / std[c];
+            }
+        }
+    }
+    return blob;
+}
+int read_files_in_dir(const char *p_dir_name, std::vector<std::string> &file_names) {
+    DIR *p_dir = opendir(p_dir_name);
+    if (p_dir == nullptr) {
+        return -1;
+    }
+    struct dirent* p_file = nullptr;
+    while ((p_file = readdir(p_dir)) != nullptr) {
+        if (strcmp(p_file->d_name, ".") != 0 &&
+                strcmp(p_file->d_name, "..") != 0) {
+            std::string cur_file_name(p_file->d_name);
+            file_names.push_back(cur_file_name);
+        }
+    }
+    closedir(p_dir);
+    return 0;
+}
+static void decode_outputs(float* prob, std::vector<Object>& objects, float scale, const int img_w, const int img_h) {
+        std::vector<Object> proposals;
+        std::vector<int> strides = {8, 16, 32};
+        std::vector<GridAndStride> grid_strides;
+        generate_grids_and_stride(INPUT_W, strides, grid_strides);
+        generate_yolox_proposals(grid_strides, prob,  BBOX_CONF_THRESH, proposals);
+        std::cout << "num of boxes before nms: " << proposals.size() << std::endl;
+        qsort_descent_inplace(proposals);
+        std::vector<int> picked;
+        nms_sorted_bboxes(proposals, picked, NMS_THRESH);
+        int count = picked.size();
+        std::cout << "num of boxes: " << count << std::endl;
+        objects.resize(count);
+        for (int i = 0; i < count; i++)
+        {
+            objects[i] = proposals[picked[i]];
+            // adjust offset to original unpadded
+            float x0 = (objects[i].rect.x) / scale;
+            float y0 = (objects[i].rect.y) / scale;
+            float x1 = (objects[i].rect.x + objects[i].rect.width) / scale;
+            float y1 = (objects[i].rect.y + objects[i].rect.height) / scale;
+            // clip
+            x0 = std::max(std::min(x0, (float)(img_w - 1)), 0.f);
+            y0 = std::max(std::min(y0, (float)(img_h - 1)), 0.f);
+            x1 = std::max(std::min(x1, (float)(img_w - 1)), 0.f);
+            y1 = std::max(std::min(y1, (float)(img_h - 1)), 0.f);
+            objects[i].rect.x = x0;
+            objects[i].rect.y = y0;
+            objects[i].rect.width = x1 - x0;
+            objects[i].rect.height = y1 - y0;
+        }
+}
+const float color_list[80][3] =
+{
+    {0.000, 0.447, 0.741},
+    {0.850, 0.325, 0.098},
+    {0.929, 0.694, 0.125},
+    {0.494, 0.184, 0.556},
+    {0.466, 0.674, 0.188},
+    {0.301, 0.745, 0.933},
+    {0.635, 0.078, 0.184},
+    {0.300, 0.300, 0.300},
+    {0.600, 0.600, 0.600},
+    {1.000, 0.000, 0.000},
+    {1.000, 0.500, 0.000},
+    {0.749, 0.749, 0.000},
+    {0.000, 1.000, 0.000},
+    {0.000, 0.000, 1.000},
+    {0.667, 0.000, 1.000},
+    {0.333, 0.333, 0.000},
+    {0.333, 0.667, 0.000},
+    {0.333, 1.000, 0.000},
+    {0.667, 0.333, 0.000},
+    {0.667, 0.667, 0.000},
+    {0.667, 1.000, 0.000},
+    {1.000, 0.333, 0.000},
+    {1.000, 0.667, 0.000},
+    {1.000, 1.000, 0.000},
+    {0.000, 0.333, 0.500},
+    {0.000, 0.667, 0.500},
+    {0.000, 1.000, 0.500},
+    {0.333, 0.000, 0.500},
+    {0.333, 0.333, 0.500},
+    {0.333, 0.667, 0.500},
+    {0.333, 1.000, 0.500},
+    {0.667, 0.000, 0.500},
+    {0.667, 0.333, 0.500},
+    {0.667, 0.667, 0.500},
+    {0.667, 1.000, 0.500},
+    {1.000, 0.000, 0.500},
+    {1.000, 0.333, 0.500},
+    {1.000, 0.667, 0.500},
+    {1.000, 1.000, 0.500},
+    {0.000, 0.333, 1.000},
+    {0.000, 0.667, 1.000},
+    {0.000, 1.000, 1.000},
+    {0.333, 0.000, 1.000},
+    {0.333, 0.333, 1.000},
+    {0.333, 0.667, 1.000},
+    {0.333, 1.000, 1.000},
+    {0.667, 0.000, 1.000},
+    {0.667, 0.333, 1.000},
+    {0.667, 0.667, 1.000},
+    {0.667, 1.000, 1.000},
+    {1.000, 0.000, 1.000},
+    {1.000, 0.333, 1.000},
+    {1.000, 0.667, 1.000},
+    {0.333, 0.000, 0.000},
+    {0.500, 0.000, 0.000},
+    {0.667, 0.000, 0.000},
+    {0.833, 0.000, 0.000},
+    {1.000, 0.000, 0.000},
+    {0.000, 0.167, 0.000},
+    {0.000, 0.333, 0.000},
+    {0.000, 0.500, 0.000},
+    {0.000, 0.667, 0.000},
+    {0.000, 0.833, 0.000},
+    {0.000, 1.000, 0.000},
+    {0.000, 0.000, 0.167},
+    {0.000, 0.000, 0.333},
+    {0.000, 0.000, 0.500},
+    {0.000, 0.000, 0.667},
+    {0.000, 0.000, 0.833},
+    {0.000, 0.000, 1.000},
+    {0.000, 0.000, 0.000},
+    {0.143, 0.143, 0.143},
+    {0.286, 0.286, 0.286},
+    {0.429, 0.429, 0.429},
+    {0.571, 0.571, 0.571},
+    {0.714, 0.714, 0.714},
+    {0.857, 0.857, 0.857},
+    {0.000, 0.447, 0.741},
+    {0.314, 0.717, 0.741},
+    {0.50, 0.5, 0}
+};
+static void draw_objects(const cv::Mat& bgr, const std::vector<Object>& objects, std::string f)
+{
+    static const char* class_names[] = {
+        "person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light",
+        "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow",
+        "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee",
+        "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard",
+        "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple",
+        "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch",
+        "potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone",
+        "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear",
+        "hair drier", "toothbrush"
+    };
+    cv::Mat image = bgr.clone();
+    for (size_t i = 0; i < objects.size(); i++)
+    {
+        const Object& obj = objects[i];
+        fprintf(stderr, "%d = %.5f at %.2f %.2f %.2f x %.2f\n", obj.label, obj.prob,
+                obj.rect.x, obj.rect.y, obj.rect.width, obj.rect.height);
+        cv::Scalar color = cv::Scalar(color_list[obj.label][0], color_list[obj.label][1], color_list[obj.label][2]);
+        float c_mean = cv::mean(color)[0];
+        cv::Scalar txt_color;
+        if (c_mean > 0.5){
+            txt_color = cv::Scalar(0, 0, 0);
+        }else{
+            txt_color = cv::Scalar(255, 255, 255);
+        }
+        cv::rectangle(image, obj.rect, color * 255, 2);
+        char text[256];
+        sprintf(text, "%s %.1f%%", class_names[obj.label], obj.prob * 100);
+        int baseLine = 0;
+        cv::Size label_size = cv::getTextSize(text, cv::FONT_HERSHEY_COMPLEX, 0.4, 1, &baseLine);
+        cv::Scalar txt_bk_color = color * 0.7 * 255;
+        int x = obj.rect.x;
+        int y = obj.rect.y + 1;
+        //int y = obj.rect.y - label_size.height - baseLine;
+        if (y > image.rows)
+            y = image.rows;
+        //if (x + label_size.width > image.cols)
+            //x = image.cols - label_size.width;
+        cv::rectangle(image, cv::Rect(cv::Point(x, y), cv::Size(label_size.width, label_size.height + baseLine)),
+                      txt_bk_color, -1);
+        cv::putText(image, text, cv::Point(x, y + label_size.height),
+                    cv::FONT_HERSHEY_COMPLEX, 0.4, txt_color, 1);
+    }
+    cv::imwrite("_" + f, image);
+    fprintf(stderr, "save vis file\n");
+    /* cv::imshow("image", image); */
+    /* cv::waitKey(0); */
+}
+void doInference(IExecutionContext& context, float* input, float* output, const int output_size, cv::Size input_shape) {
+    const ICudaEngine& engine = context.getEngine();
+    // Pointers to input and output device buffers to pass to engine.
+    // Engine requires exactly IEngine::getNbBindings() number of buffers.
+    assert(engine.getNbBindings() == 2);
+    void* buffers[2];
+    // In order to bind the buffers, we need to know the names of the input and output tensors.
+    // Note that indices are guaranteed to be less than IEngine::getNbBindings()
+    const int inputIndex = engine.getBindingIndex(INPUT_BLOB_NAME);
+    assert(engine.getBindingDataType(inputIndex) == nvinfer1::DataType::kFLOAT);
+    const int outputIndex = engine.getBindingIndex(OUTPUT_BLOB_NAME);
+    assert(engine.getBindingDataType(outputIndex) == nvinfer1::DataType::kFLOAT);
+    int mBatchSize = engine.getMaxBatchSize();
+    // Create GPU buffers on device
+    CHECK(cudaMalloc(&buffers[inputIndex], 3 * input_shape.height * input_shape.width * sizeof(float)));
+    CHECK(cudaMalloc(&buffers[outputIndex], output_size*sizeof(float)));
+    // Create stream
+    cudaStream_t stream;
+    CHECK(cudaStreamCreate(&stream));
+    // DMA input batch data to device, infer on the batch asynchronously, and DMA output back to host
+    CHECK(cudaMemcpyAsync(buffers[inputIndex], input, 3 * input_shape.height * input_shape.width * sizeof(float), cudaMemcpyHostToDevice, stream));
+    context.enqueue(1, buffers, stream, nullptr);
+    CHECK(cudaMemcpyAsync(output, buffers[outputIndex], output_size * sizeof(float), cudaMemcpyDeviceToHost, stream));
+    cudaStreamSynchronize(stream);
+    // Release stream and buffers
+    cudaStreamDestroy(stream);
+    CHECK(cudaFree(buffers[inputIndex]));
+    CHECK(cudaFree(buffers[outputIndex]));
+}
+int main(int argc, char** argv) {
+    cudaSetDevice(DEVICE);
+    // create a model using the API directly and serialize it to a stream
+    char *trtModelStream{nullptr};
+    size_t size{0};
+    if (argc == 3 && std::string(argv[1]) == "-d") {
+        std::ifstream file("model_trt.engine", std::ios::binary);
+        if (file.good()) {
+            file.seekg(0, file.end);
+            size = file.tellg();
+            file.seekg(0, file.beg);
+            trtModelStream = new char[size];
+            assert(trtModelStream);
+            file.read(trtModelStream, size);
+            file.close();
+        }
+    } else {
+        std::cerr << "arguments not right!" << std::endl;
+        std::cerr << "run 'python3 yolox/deploy/trt.py -n yolox-{tiny, s, m, l, x}' to serialize model first!" << std::endl;
+        std::cerr << "./yolox -d ../samples  // deserialize file and run inference" << std::endl;
+        return -1;
+    }
+    std::vector<std::string> file_names;
+    if (read_files_in_dir(argv[2], file_names) < 0) {
+        std::cout << "read_files_in_dir failed." << std::endl;
+        return -1;
+    }
+    IRuntime* runtime = createInferRuntime(gLogger);
+    assert(runtime != nullptr);
+    ICudaEngine* engine = runtime->deserializeCudaEngine(trtModelStream, size);
+    assert(engine != nullptr);
+    IExecutionContext* context = engine->createExecutionContext();
+    assert(context != nullptr);
+    delete[] trtModelStream;
+    auto out_dims = engine->getBindingDimensions(1);
+    auto output_size = 1;
+    for(int j=0;j<out_dims.nbDims;j++) {
+        output_size *= out_dims.d[j];
+    }
+    static float* prob = new float[output_size];
+    int fcount = 0;
+    for (auto f: file_names) {
+        fcount++;
+        std::cout << fcount << "  " << f << std::endl;
+        cv::Mat img = cv::imread(std::string(argv[2]) + "/" + f);
+        if (img.empty()) continue;
+        int img_w = img.cols;
+        int img_h = img.rows;
+        cv::Mat pr_img = static_resize(img);
+        std::cout << "blob image" << std::endl;
+        float* blob;
+        blob = blobFromImage(pr_img);
+        float scale = std::min(INPUT_W / (img.cols*1.0), INPUT_H / (img.rows*1.0));
+        // Run inference
+        auto start = std::chrono::system_clock::now();
+        doInference(*context, blob, prob, output_size, pr_img.size());
+        auto end = std::chrono::system_clock::now();
+        std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << "ms" << std::endl;
+        std::vector<Object> objects;
+        decode_outputs(prob, objects, scale, img_w, img_h);
+        draw_objects(img, objects, f);
+    }
+    // Destroy the engine
+    context->destroy();
+    engine->destroy();
+    runtime->destroy();
+    return 0;
+}

demo/TensorRT/python/README.md ADDED Viewed

	@@ -0,0 +1,46 @@

+# User Guide for Deploy YOLOX on TensorRT
+This toturial includes a Python demo for TensorRT.
+## Install TensorRT Toolkit
+Please follow the [TensorRT Installation Guide](https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html) and [torch2trt gitrepo](https://github.com/NVIDIA-AI-IOT/torch2trt) to install TensorRT and torch2trt.
+## Convert model
+YOLOX models can be easily conveted to TensorRT models using torch2trt
+   If you want to convert our model, use the flag -n to specify a model name:
+   ```shell
+   python tools/deploy/trt.py -n <YOLOX_MODEL_NAME> -c <YOLOX_CHECKPOINT>
+   ```
+   For example:
+   ```shell
+   python tools/deploy/trt.py -n yolox-s -c your_ckpt.pth.tar
+   ```
+   <YOLOX_MODEL_NAME> can be: yolox-nano, yolox-tiny. yolox-s, yolox-m, yolox-l, yolox-x.
+   If you want to convert your customized model, use the flag -f to specify you exp file:
+   ```shell
+   python tools/deploy/trt.py -f <YOLOX_EXP_FILE> -c <YOLOX_CHECKPOINT>
+   ```
+   For example:
+   ```shell
+   python tools/deploy/trt.py -f /path/to/your/yolox/exps/yolox_s.py -c your_ckpt.pth.tar
+   ```
+   *yolox_s.py* can be any exp file modified by you.
+The converted model and the serialized engine file (for C++ demo) will be saved on your experiment output dir.
+## Demo
+The TensorRT python demo is merged on our pytorch demo file, so you can run the pytorch demo command with ```--trt```.
+```shell
+python tools/demo.py -n yolox-s --trt --conf 0.3 --nms 0.65 --tsize 640
+```
+or
+```shell
+python tools/demo.py -f exps/base/yolox_s.py --trt --conf 0.3 --nms 0.65 --tsize 640
+```