Spaces:

tidalove
/

yolox

Sleeping

App Files Files Community

haolongzhangm commited on Jul 26, 2021

Commit

a6a308d

1 Parent(s): c4f16a5

feat(demo): add MegEngine demo example (#146)

Browse files

Files changed (19) hide show

README.md +5 -4
demo/MegEngine/cpp/README.md +122 -0
demo/MegEngine/cpp/build.sh +61 -0
demo/MegEngine/cpp/yolox.cpp +473 -0
demo/MegEngine/python/README.md +33 -0
demo/MegEngine/python/build.py +54 -0
demo/MegEngine/python/coco_classes.py +86 -0
demo/MegEngine/python/convert_weights.py +64 -0
demo/MegEngine/python/demo.py +202 -0
demo/MegEngine/python/dump.py +51 -0
demo/MegEngine/python/models/__init__.py +9 -0
demo/MegEngine/python/models/darknet.py +154 -0
demo/MegEngine/python/models/network_blocks.py +183 -0
demo/MegEngine/python/models/yolo_fpn.py +78 -0
demo/MegEngine/python/models/yolo_head.py +193 -0
demo/MegEngine/python/models/yolo_pafpn.py +111 -0
demo/MegEngine/python/models/yolox.py +34 -0
demo/MegEngine/python/process.py +76 -0
demo/MegEngine/python/visualize.py +128 -0

README.md CHANGED Viewed

@@ -152,10 +152,11 @@ python tools/eval.py -n  yolox-s -c yolox_s.pth.tar -b 1 -d 1 --conf 0.001 --fp1
 ## Deployment
-1.  [ONNX export and an ONNXRuntime](./demo/ONNXRuntime)
-2.  [TensorRT in C++ and Python](./demo/TensorRT)
-3.  [ncnn in C++ and Java](./demo/ncnn)
-4.  [OpenVINO in C++ and Python](./demo/OpenVINO)
 ## Third-party resources

 ## Deployment
+1.  [MegEngine in C++ and Python](./demo/MegEngine)
+2.  [ONNX export and an ONNXRuntime](./demo/ONNXRuntime)
+3.  [TensorRT in C++ and Python](./demo/TensorRT)
+4.  [ncnn in C++ and Java](./demo/ncnn)
+5.  [OpenVINO in C++ and Python](./demo/OpenVINO)
 ## Third-party resources

demo/MegEngine/cpp/README.md ADDED Viewed

	@@ -0,0 +1,122 @@

+# YOLOX-CPP-MegEngine
+Cpp file compile of YOLOX object detection base on [MegEngine](https://github.com/MegEngine/MegEngine).
+## Tutorial
+### Step1: install toolchain
+	* host: sudo apt install gcc/g++ (gcc/g++, which version >= 6) build-essential git git-lfs gfortran libgfortran-6-dev autoconf gnupg flex bison gperf curl zlib1g-dev gcc-multilib g++-multilib cmake
+ * cross build android: download [NDK](https://developer.android.com/ndk/downloads)
+   	* after unzip download NDK, then export NDK_ROOT="path of NDK"
+### Step2: build MegEngine
+ * git clone https://github.com/MegEngine/MegEngine.git
+ * then init third_party
+    * then export megengine_root="path of MegEngine"
+    * cd $megengine_root && ./third_party/prepare.sh && ./third_party/install-mkl.sh
+ * build example:
+    * build host without cuda:   ./scripts/cmake-build/host_build.sh
+    * build host with cuda      :   ./scripts/cmake-build/host_build.sh -c
+    * cross build for android aarch64: ./scripts/cmake-build/cross_build_android_arm_inference.sh
+    * cross build for android aarch64(with V8.2+fp16): ./scripts/cmake-build/cross_build_android_arm_inference.sh -f
+* after build MegEngine, you need export the `MGE_INSTALL_PATH`
+  * host without cuda: export MGE_INSTALL_PATH=${megengine_root}/build_dir/host/MGE_WITH_CUDA_OFF/MGE_INFERENCE_ONLY_ON/Release/install
+  * host with cuda: export MGE_INSTALL_PATH=${megengine_root}/build_dir/host/MGE_WITH_CUDA_ON/MGE_INFERENCE_ONLY_ON/Release/install
+  * cross build for android aarch64: export MGE_INSTALL_PATH=${megengine_root}/build_dir/android/arm64-v8a/Release/install
+* you can refs [build tutorial of MegEngine](https://github.com/MegEngine/MegEngine/blob/master/scripts/cmake-build/BUILD_README.md) to build other platform, eg, windows/macos/ etc!
+### Step3: build OpenCV
+* git clone https://github.com/opencv/opencv.git
+* git checkout 3.4.15 (we test at 3.4.15, if test other version, may need modify some build)
+* patch diff for android:
+  * ```
+    diff --git a/CMakeLists.txt b/CMakeLists.txt
+    index f6a2da5310..10354312c9 100644
+    --- a/CMakeLists.txt
+    +++ b/CMakeLists.txt
+    @@ -643,7 +643,7 @@ if(UNIX)
+       if(NOT APPLE)
+         CHECK_INCLUDE_FILE(pthread.h HAVE_PTHREAD)
+         if(ANDROID)
+    -      set(OPENCV_LINKER_LIBS ${OPENCV_LINKER_LIBS} dl m log)
+    +      set(OPENCV_LINKER_LIBS ${OPENCV_LINKER_LIBS} dl m log z)
+         elseif(CMAKE_SYSTEM_NAME MATCHES "FreeBSD|NetBSD|DragonFly|OpenBSD|Haiku")
+           set(OPENCV_LINKER_LIBS ${OPENCV_LINKER_LIBS} m pthread)
+         elseif(EMSCRIPTEN)
+    ```
+* build for host
+  * ```
+    cd root_dir_of_opencv
+    mkdir -p build/install
+    cd build
+    cmake -DBUILD_JAVA=OFF -DBUILD_SHARED_LIBS=ON -DCMAKE_INSTALL_PREFIX=$PWD/install
+    make install -j32
+    ```
+* build for android-aarch64
+  * ```
+    cd root_dir_of_opencv
+    mkdir -p build_android/install
+    cd build_android
+    cmake -DCMAKE_TOOLCHAIN_FILE="$NDK_ROOT/build/cmake/android.toolchain.cmake" -DANDROID_NDK="$NDK_ROOT"  -DANDROID_ABI=arm64-v8a -DANDROID_NATIVE_API_LEVEL=21 -DBUILD_JAVA=OFF -DBUILD_ANDROID_PROJECTS=OFF -DBUILD_ANDROID_EXAMPLES=OFF -DBUILD_SHARED_LIBS=ON -DCMAKE_INSTALL_PREFIX=$PWD/install ..
+    make install -j32
+    ```
+* after build OpenCV, you need export  `OPENCV_INSTALL_INCLUDE_PATH ` and `OPENCV_INSTALL_LIB_PATH`
+  * host build:
+    * export OPENCV_INSTALL_INCLUDE_PATH=${path of opencv}/build/install/include
+    * export OPENCV_INSTALL_LIB_PATH=${path of opencv}/build/install/lib
+  * cross build for android aarch64:
+    * export OPENCV_INSTALL_INCLUDE_PATH=${path of opencv}/build_android/install/sdk/native/jni/include
+    * export OPENCV_INSTALL_LIB_PATH=${path of opencv}/build_android/install/sdk/native/libs/arm64-v8a
+###  Step4: build test demo
+ * run build.sh
+    * host:
+       * export CXX=g++
+       * ./build.sh
+   * cross android aarch64
+     *  export CXX=aarch64-linux-android21-clang++
+     * ./build.sh
+### Step5: run demo
+> **Note**: two ways to get `yolox_s.mge` model file
+>
+> * reference to python demo's `dump.py` script.
+> * wget https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_s.mge
+* host:
+  * LD_LIBRARY_PATH=$MGE_INSTALL_PATH/lib/:$OPENCV_INSTALL_LIB_PATH ./yolox yolox_s.mge ../../../assets/dog.jpg cuda/cpu/multithread <warmup_count> <thread_number>
+* cross android
+  * adb push/scp $MGE_INSTALL_PATH/lib/libmegengine.so android_phone
+  * adb push/scp $OPENCV_INSTALL_LIB_PATH/*.so android_phone
+  * adb push/scp ./yolox yolox_s.mge android_phone
+  * adb push/scp ../../../assets/dog.jpg android_phone
+  * login in android_phone by adb or ssh
+    * then run: LD_LIBRARY_PATH=. ./yolox yolox_s.mge dog.jpg cpu/multithread <warmup_count> <thread_number> <use_fast_run> <use_weight_preprocess>  <run_with_fp16>
+      * <warmup_count> means warmup count, valid number >=0
+      * <thread_number> means thread number, valid number >=1, only take effect `multithread` device
+      * <use_fast_run> if >=1 , will use fastrun to choose best algo
+      * <use_weight_preprocess> if >=1, will handle weight preprocess before exe
+      * <run_with_fp16> if >=1, will run with fp16 mode
+## Acknowledgement
+* [MegEngine](https://github.com/MegEngine/MegEngine)
+* [OpenCV](https://github.com/opencv/opencv)
+* [NDK](https://developer.android.com/ndk)
+* [CMAKE](https://cmake.org/)

demo/MegEngine/cpp/build.sh ADDED Viewed

	@@ -0,0 +1,61 @@

+#!/usr/bin/env bash
+set -e
+if [ -z $CXX ];then
+    echo "please export you c++ toolchain to CXX"
+    echo "for example:"
+    echo "build for host:                                        export CXX=g++"
+    echo "cross build for aarch64-android(always locate in NDK): export CXX=aarch64-linux-android21-clang++"
+    echo "cross build for aarch64-linux:                         export CXX=aarch64-linux-gnu-g++"
+    exit -1
+fi
+if [ -z $MGE_INSTALL_PATH ];then
+    echo "please refsi ./README.md to init MGE_INSTALL_PATH env"
+    exit -1
+fi
+if [ -z $OPENCV_INSTALL_INCLUDE_PATH ];then
+    echo "please refs ./README.md to init OPENCV_INSTALL_INCLUDE_PATH env"
+    exit -1
+fi
+if [ -z $OPENCV_INSTALL_LIB_PATH ];then
+    echo "please refs ./README.md to init OPENCV_INSTALL_LIB_PATH env"
+    exit -1
+fi
+INCLUDE_FLAG="-I$MGE_INSTALL_PATH/include -I$OPENCV_INSTALL_INCLUDE_PATH"
+LINK_FLAG="-L$MGE_INSTALL_PATH/lib/ -lmegengine -L$OPENCV_INSTALL_LIB_PATH -lopencv_core -lopencv_highgui -lopencv_imgproc -lopencv_imgcodecs"
+BUILD_FLAG="-static-libstdc++ -O3 -pie -fPIE -g"
+if [[ $CXX =~ "android" ]]; then
+    LINK_FLAG="${LINK_FLAG} -llog -lz"
+fi
+echo "CXX: $CXX"
+echo "MGE_INSTALL_PATH: $MGE_INSTALL_PATH"
+echo "INCLUDE_FLAG: $INCLUDE_FLAG"
+echo "LINK_FLAG: $LINK_FLAG"
+echo "BUILD_FLAG: $BUILD_FLAG"
+echo "[" > compile_commands.json
+echo "{" >> compile_commands.json
+echo "\"directory\": \"$PWD\"," >> compile_commands.json
+echo "\"command\": \"$CXX yolox.cpp -o yolox ${INCLUDE_FLAG} ${LINK_FLAG}\"," >> compile_commands.json
+echo "\"file\": \"$PWD/yolox.cpp\"," >> compile_commands.json
+echo "}," >> compile_commands.json
+echo "]" >> compile_commands.json
+$CXX yolox.cpp -o yolox ${INCLUDE_FLAG} ${LINK_FLAG} ${BUILD_FLAG}
+echo "build success, output file: yolox"
+if [[ $CXX =~ "android" ]]; then
+    echo "try command to run:"
+    echo "adb push/scp $MGE_INSTALL_PATH/lib/libmegengine.so android_phone"
+    echo "adb push/scp $OPENCV_INSTALL_LIB_PATH/*.so android_phone"
+    echo "adb push/scp ./yolox yolox_s.mge android_phone"
+    echo "adb push/scp ../../../assets/dog.jpg android_phone"
+    echo "adb/ssh to android_phone, then run: LD_LIBRARY_PATH=. ./yolox yolox_s.mge dog.jpg cpu/multithread <warmup_count> <thread_number> <use_fast_run> <use_weight_preprocess>"
+else
+    echo "try command to run: LD_LIBRARY_PATH=$MGE_INSTALL_PATH/lib/:$OPENCV_INSTALL_LIB_PATH ./yolox yolox_s.mge ../../../assets/dog.jpg cuda/cpu/multithread <warmup_count> <thread_number> <use_fast_run> <use_weight_preprocess>"
+fi

demo/MegEngine/cpp/yolox.cpp ADDED Viewed

	@@ -0,0 +1,473 @@

+// Copyright (C) 2018-2021 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+#include "megbrain/gopt/inference.h"
+#include "megbrain/opr/search_policy/algo_chooser_helper.h"
+#include "megbrain/serialization/serializer.h"
+#include <iostream>
+#include <iterator>
+#include <memory>
+#include <opencv2/opencv.hpp>
+#include <stdlib.h>
+#include <string>
+#include <vector>
+/**
+ * @brief Define names based depends on Unicode path support
+ */
+#define NMS_THRESH 0.65
+#define BBOX_CONF_THRESH 0.3
+constexpr int INPUT_W = 640;
+constexpr int INPUT_H = 640;
+using namespace mgb;
+cv::Mat static_resize(cv::Mat &img) {
+  float r = std::min(INPUT_W / (img.cols * 1.0), INPUT_H / (img.rows * 1.0));
+  int unpad_w = r * img.cols;
+  int unpad_h = r * img.rows;
+  cv::Mat re(unpad_h, unpad_w, CV_8UC3);
+  cv::resize(img, re, re.size());
+  cv::Mat out(INPUT_W, INPUT_H, CV_8UC3, cv::Scalar(114, 114, 114));
+  re.copyTo(out(cv::Rect(0, 0, re.cols, re.rows)));
+  return out;
+}
+void blobFromImage(cv::Mat &img, float *blob_data) {
+  cv::cvtColor(img, img, cv::COLOR_BGR2RGB);
+  int channels = 3;
+  int img_h = img.rows;
+  int img_w = img.cols;
+  std::vector<float> mean = {0.485, 0.456, 0.406};
+  std::vector<float> std = {0.229, 0.224, 0.225};
+  for (size_t c = 0; c < channels; c++) {
+    for (size_t h = 0; h < img_h; h++) {
+      for (size_t w = 0; w < img_w; w++) {
+        blob_data[c * img_w * img_h + h * img_w + w] =
+            (((float)img.at<cv::Vec3b>(h, w)[c]) / 255.0f - mean[c]) / std[c];
+      }
+    }
+  }
+}
+struct Object {
+  cv::Rect_<float> rect;
+  int label;
+  float prob;
+};
+struct GridAndStride {
+  int grid0;
+  int grid1;
+  int stride;
+};
+static void
+generate_grids_and_stride(const int target_size, std::vector<int> &strides,
+                          std::vector<GridAndStride> &grid_strides) {
+  for (auto stride : strides) {
+    int num_grid = target_size / stride;
+    for (int g1 = 0; g1 < num_grid; g1++) {
+      for (int g0 = 0; g0 < num_grid; g0++) {
+        grid_strides.push_back((GridAndStride){g0, g1, stride});
+      }
+    }
+  }
+}
+static void generate_yolox_proposals(std::vector<GridAndStride> grid_strides,
+                                     const float *feat_ptr,
+                                     float prob_threshold,
+                                     std::vector<Object> &objects) {
+  const int num_class = 80;
+  const int num_anchors = grid_strides.size();
+  for (int anchor_idx = 0; anchor_idx < num_anchors; anchor_idx++) {
+    const int grid0 = grid_strides[anchor_idx].grid0;
+    const int grid1 = grid_strides[anchor_idx].grid1;
+    const int stride = grid_strides[anchor_idx].stride;
+    const int basic_pos = anchor_idx * 85;
+    float x_center = (feat_ptr[basic_pos + 0] + grid0) * stride;
+    float y_center = (feat_ptr[basic_pos + 1] + grid1) * stride;
+    float w = exp(feat_ptr[basic_pos + 2]) * stride;
+    float h = exp(feat_ptr[basic_pos + 3]) * stride;
+    float x0 = x_center - w * 0.5f;
+    float y0 = y_center - h * 0.5f;
+    float box_objectness = feat_ptr[basic_pos + 4];
+    for (int class_idx = 0; class_idx < num_class; class_idx++) {
+      float box_cls_score = feat_ptr[basic_pos + 5 + class_idx];
+      float box_prob = box_objectness * box_cls_score;
+      if (box_prob > prob_threshold) {
+        Object obj;
+        obj.rect.x = x0;
+        obj.rect.y = y0;
+        obj.rect.width = w;
+        obj.rect.height = h;
+        obj.label = class_idx;
+        obj.prob = box_prob;
+        objects.push_back(obj);
+      }
+    } // class loop
+  } // point anchor loop
+}
+static inline float intersection_area(const Object &a, const Object &b) {
+  cv::Rect_<float> inter = a.rect & b.rect;
+  return inter.area();
+}
+static void qsort_descent_inplace(std::vector<Object> &faceobjects, int left,
+                                  int right) {
+  int i = left;
+  int j = right;
+  float p = faceobjects[(left + right) / 2].prob;
+  while (i <= j) {
+    while (faceobjects[i].prob > p)
+      i++;
+    while (faceobjects[j].prob < p)
+      j--;
+    if (i <= j) {
+      // swap
+      std::swap(faceobjects[i], faceobjects[j]);
+      i++;
+      j--;
+    }
+  }
+#pragma omp parallel sections
+  {
+#pragma omp section
+    {
+      if (left < j)
+        qsort_descent_inplace(faceobjects, left, j);
+    }
+#pragma omp section
+    {
+      if (i < right)
+        qsort_descent_inplace(faceobjects, i, right);
+    }
+  }
+}
+static void qsort_descent_inplace(std::vector<Object> &objects) {
+  if (objects.empty())
+    return;
+  qsort_descent_inplace(objects, 0, objects.size() - 1);
+}
+static void nms_sorted_bboxes(const std::vector<Object> &faceobjects,
+                              std::vector<int> &picked, float nms_threshold) {
+  picked.clear();
+  const int n = faceobjects.size();
+  std::vector<float> areas(n);
+  for (int i = 0; i < n; i++) {
+    areas[i] = faceobjects[i].rect.area();
+  }
+  for (int i = 0; i < n; i++) {
+    const Object &a = faceobjects[i];
+    int keep = 1;
+    for (int j = 0; j < (int)picked.size(); j++) {
+      const Object &b = faceobjects[picked[j]];
+      // intersection over union
+      float inter_area = intersection_area(a, b);
+      float union_area = areas[i] + areas[picked[j]] - inter_area;
+      // float IoU = inter_area / union_area
+      if (inter_area / union_area > nms_threshold)
+        keep = 0;
+    }
+    if (keep)
+      picked.push_back(i);
+  }
+}
+static void decode_outputs(const float *prob, std::vector<Object> &objects,
+                           float scale, const int img_w, const int img_h) {
+  std::vector<Object> proposals;
+  std::vector<int> strides = {8, 16, 32};
+  std::vector<GridAndStride> grid_strides;
+  generate_grids_and_stride(INPUT_W, strides, grid_strides);
+  generate_yolox_proposals(grid_strides, prob, BBOX_CONF_THRESH, proposals);
+  qsort_descent_inplace(proposals);
+  std::vector<int> picked;
+  nms_sorted_bboxes(proposals, picked, NMS_THRESH);
+  int count = picked.size();
+  objects.resize(count);
+  for (int i = 0; i < count; i++) {
+    objects[i] = proposals[picked[i]];
+    // adjust offset to original unpadded
+    float x0 = (objects[i].rect.x) / scale;
+    float y0 = (objects[i].rect.y) / scale;
+    float x1 = (objects[i].rect.x + objects[i].rect.width) / scale;
+    float y1 = (objects[i].rect.y + objects[i].rect.height) / scale;
+    // clip
+    x0 = std::max(std::min(x0, (float)(img_w - 1)), 0.f);
+    y0 = std::max(std::min(y0, (float)(img_h - 1)), 0.f);
+    x1 = std::max(std::min(x1, (float)(img_w - 1)), 0.f);
+    y1 = std::max(std::min(y1, (float)(img_h - 1)), 0.f);
+    objects[i].rect.x = x0;
+    objects[i].rect.y = y0;
+    objects[i].rect.width = x1 - x0;
+    objects[i].rect.height = y1 - y0;
+  }
+}
+const float color_list[80][3] = {
+    {0.000, 0.447, 0.741}, {0.850, 0.325, 0.098}, {0.929, 0.694, 0.125},
+    {0.494, 0.184, 0.556}, {0.466, 0.674, 0.188}, {0.301, 0.745, 0.933},
+    {0.635, 0.078, 0.184}, {0.300, 0.300, 0.300}, {0.600, 0.600, 0.600},
+    {1.000, 0.000, 0.000}, {1.000, 0.500, 0.000}, {0.749, 0.749, 0.000},
+    {0.000, 1.000, 0.000}, {0.000, 0.000, 1.000}, {0.667, 0.000, 1.000},
+    {0.333, 0.333, 0.000}, {0.333, 0.667, 0.000}, {0.333, 1.000, 0.000},
+    {0.667, 0.333, 0.000}, {0.667, 0.667, 0.000}, {0.667, 1.000, 0.000},
+    {1.000, 0.333, 0.000}, {1.000, 0.667, 0.000}, {1.000, 1.000, 0.000},
+    {0.000, 0.333, 0.500}, {0.000, 0.667, 0.500}, {0.000, 1.000, 0.500},
+    {0.333, 0.000, 0.500}, {0.333, 0.333, 0.500}, {0.333, 0.667, 0.500},
+    {0.333, 1.000, 0.500}, {0.667, 0.000, 0.500}, {0.667, 0.333, 0.500},
+    {0.667, 0.667, 0.500}, {0.667, 1.000, 0.500}, {1.000, 0.000, 0.500},
+    {1.000, 0.333, 0.500}, {1.000, 0.667, 0.500}, {1.000, 1.000, 0.500},
+    {0.000, 0.333, 1.000}, {0.000, 0.667, 1.000}, {0.000, 1.000, 1.000},
+    {0.333, 0.000, 1.000}, {0.333, 0.333, 1.000}, {0.333, 0.667, 1.000},
+    {0.333, 1.000, 1.000}, {0.667, 0.000, 1.000}, {0.667, 0.333, 1.000},
+    {0.667, 0.667, 1.000}, {0.667, 1.000, 1.000}, {1.000, 0.000, 1.000},
+    {1.000, 0.333, 1.000}, {1.000, 0.667, 1.000}, {0.333, 0.000, 0.000},
+    {0.500, 0.000, 0.000}, {0.667, 0.000, 0.000}, {0.833, 0.000, 0.000},
+    {1.000, 0.000, 0.000}, {0.000, 0.167, 0.000}, {0.000, 0.333, 0.000},
+    {0.000, 0.500, 0.000}, {0.000, 0.667, 0.000}, {0.000, 0.833, 0.000},
+    {0.000, 1.000, 0.000}, {0.000, 0.000, 0.167}, {0.000, 0.000, 0.333},
+    {0.000, 0.000, 0.500}, {0.000, 0.000, 0.667}, {0.000, 0.000, 0.833},
+    {0.000, 0.000, 1.000}, {0.000, 0.000, 0.000}, {0.143, 0.143, 0.143},
+    {0.286, 0.286, 0.286}, {0.429, 0.429, 0.429}, {0.571, 0.571, 0.571},
+    {0.714, 0.714, 0.714}, {0.857, 0.857, 0.857}, {0.000, 0.447, 0.741},
+    {0.314, 0.717, 0.741}, {0.50, 0.5, 0}};
+static void draw_objects(const cv::Mat &bgr,
+                         const std::vector<Object> &objects) {
+  static const char *class_names[] = {
+      "person",        "bicycle",      "car",
+      "motorcycle",    "airplane",     "bus",
+      "train",         "truck",        "boat",
+      "traffic light", "fire hydrant", "stop sign",
+      "parking meter", "bench",        "bird",
+      "cat",           "dog",          "horse",
+      "sheep",         "cow",          "elephant",
+      "bear",          "zebra",        "giraffe",
+      "backpack",      "umbrella",     "handbag",
+      "tie",           "suitcase",     "frisbee",
+      "skis",          "snowboard",    "sports ball",
+      "kite",          "baseball bat", "baseball glove",
+      "skateboard",    "surfboard",    "tennis racket",
+      "bottle",        "wine glass",   "cup",
+      "fork",          "knife",        "spoon",
+      "bowl",          "banana",       "apple",
+      "sandwich",      "orange",       "broccoli",
+      "carrot",        "hot dog",      "pizza",
+      "donut",         "cake",         "chair",
+      "couch",         "potted plant", "bed",
+      "dining table",  "toilet",       "tv",
+      "laptop",        "mouse",        "remote",
+      "keyboard",      "cell phone",   "microwave",
+      "oven",          "toaster",      "sink",
+      "refrigerator",  "book",         "clock",
+      "vase",          "scissors",     "teddy bear",
+      "hair drier",    "toothbrush"};
+  cv::Mat image = bgr.clone();
+  for (size_t i = 0; i < objects.size(); i++) {
+    const Object &obj = objects[i];
+    fprintf(stderr, "%d = %.5f at %.2f %.2f %.2f x %.2f\n", obj.label, obj.prob,
+            obj.rect.x, obj.rect.y, obj.rect.width, obj.rect.height);
+    cv::Scalar color =
+        cv::Scalar(color_list[obj.label][0], color_list[obj.label][1],
+                   color_list[obj.label][2]);
+    float c_mean = cv::mean(color)[0];
+    cv::Scalar txt_color;
+    if (c_mean > 0.5) {
+      txt_color = cv::Scalar(0, 0, 0);
+    } else {
+      txt_color = cv::Scalar(255, 255, 255);
+    }
+    cv::rectangle(image, obj.rect, color * 255, 2);
+    char text[256];
+    sprintf(text, "%s %.1f%%", class_names[obj.label], obj.prob * 100);
+    int baseLine = 0;
+    cv::Size label_size =
+        cv::getTextSize(text, cv::FONT_HERSHEY_SIMPLEX, 0.4, 1, &baseLine);
+    cv::Scalar txt_bk_color = color * 0.7 * 255;
+    int x = obj.rect.x;
+    int y = obj.rect.y + 1;
+    // int y = obj.rect.y - label_size.height - baseLine;
+    if (y > image.rows)
+      y = image.rows;
+    // if (x + label_size.width > image.cols)
+    // x = image.cols - label_size.width;
+    cv::rectangle(
+        image,
+        cv::Rect(cv::Point(x, y),
+                 cv::Size(label_size.width, label_size.height + baseLine)),
+        txt_bk_color, -1);
+    cv::putText(image, text, cv::Point(x, y + label_size.height),
+                cv::FONT_HERSHEY_SIMPLEX, 0.4, txt_color, 1);
+  }
+  cv::imwrite("out.jpg", image);
+  std::cout << "save output to out.jpg" << std::endl;
+}
+cg::ComputingGraph::OutputSpecItem make_callback_copy(SymbolVar dev,
+                                                      HostTensorND &host) {
+  auto cb = [&host](DeviceTensorND &d) { host.copy_from(d); };
+  return {dev, cb};
+}
+int main(int argc, char *argv[]) {
+  serialization::GraphLoader::LoadConfig load_config;
+  load_config.comp_graph = ComputingGraph::make();
+  auto &&graph_opt = load_config.comp_graph->options();
+  graph_opt.graph_opt_level = 0;
+  if (argc != 9) {
+    std::cout << "Usage : " << argv[0]
+              << " <path_to_model> <path_to_image> <device> <warmup_count> "
+                 "<thread_number> <use_fast_run> <use_weight_preprocess> "
+                 "<run_with_fp16>"
+              << std::endl;
+    return EXIT_FAILURE;
+  }
+  const std::string input_model{argv[1]};
+  const std::string input_image_path{argv[2]};
+  const std::string device{argv[3]};
+  const size_t warmup_count = atoi(argv[4]);
+  const size_t thread_number = atoi(argv[5]);
+  const size_t use_fast_run = atoi(argv[6]);
+  const size_t use_weight_preprocess = atoi(argv[7]);
+  const size_t run_with_fp16 = atoi(argv[8]);
+  if (device == "cuda") {
+    load_config.comp_node_mapper = [](CompNode::Locator &loc) {
+      loc.type = CompNode::DeviceType::CUDA;
+    };
+  } else if (device == "cpu") {
+    load_config.comp_node_mapper = [](CompNode::Locator &loc) {
+      loc.type = CompNode::DeviceType::CPU;
+    };
+  } else if (device == "multithread") {
+    load_config.comp_node_mapper = [thread_number](CompNode::Locator &loc) {
+      loc.type = CompNode::DeviceType::MULTITHREAD;
+      loc.device = 0;
+      loc.stream = thread_number;
+    };
+    std::cout << "use " << thread_number << " thread" << std::endl;
+  } else {
+    std::cout << "device only support cuda or cpu or multithread" << std::endl;
+    return EXIT_FAILURE;
+  }
+  if (use_weight_preprocess) {
+    std::cout << "use weight preprocess" << std::endl;
+    graph_opt.graph_opt.enable_weight_preprocess();
+  }
+  if (run_with_fp16) {
+    std::cout << "run with fp16" << std::endl;
+    graph_opt.graph_opt.enable_f16_io_comp();
+  }
+  if (device == "cuda") {
+    std::cout << "choose format for cuda" << std::endl;
+  } else {
+    std::cout << "choose format for non-cuda" << std::endl;
+#if defined(__arm__) || defined(__aarch64__)
+    if (run_with_fp16) {
+      std::cout << "use chw format when enable fp16" << std::endl;
+    } else {
+      std::cout << "choose format for nchw44 for aarch64" << std::endl;
+      graph_opt.graph_opt.enable_nchw44();
+    }
+#endif
+#if defined(__x86_64__) || defined(__amd64__) || defined(__i386__)
+    // graph_opt.graph_opt.enable_nchw88();
+#endif
+  }
+  std::unique_ptr<serialization::InputFile> inp_file =
+      serialization::InputFile::make_fs(input_model.c_str());
+  auto loader = serialization::GraphLoader::make(std::move(inp_file));
+  serialization::GraphLoader::LoadResult network =
+      loader->load(load_config, false);
+  if (use_fast_run) {
+    std::cout << "use fastrun" << std::endl;
+    using S = opr::mixin::AlgoChooserHelper::ExecutionPolicy::Strategy;
+    S strategy = static_cast<S>(0);
+    strategy = S::PROFILE | S::OPTIMIZED | strategy;
+    mgb::gopt::modify_opr_algo_strategy_inplace(network.output_var_list,
+                                                strategy);
+  }
+  auto data = network.tensor_map["data"];
+  cv::Mat image = cv::imread(input_image_path);
+  cv::Mat pr_img = static_resize(image);
+  float *data_ptr = data->resize({1, 3, 640, 640}).ptr<float>();
+  blobFromImage(pr_img, data_ptr);
+  HostTensorND predict;
+  std::unique_ptr<cg::AsyncExecutable> func = network.graph->compile(
+      {make_callback_copy(network.output_var_map.begin()->second, predict)});
+  for (auto i = 0; i < warmup_count; i++) {
+    std::cout << "warmup: " << i << std::endl;
+    func->execute();
+    func->wait();
+  }
+  auto start = std::chrono::system_clock::now();
+  func->execute();
+  func->wait();
+  auto end = std::chrono::system_clock::now();
+  std::chrono::duration<double> exec_seconds = end - start;
+  std::cout << "elapsed time: " << exec_seconds.count() << "s" << std::endl;
+  float *predict_ptr = predict.ptr<float>();
+  int img_w = image.cols;
+  int img_h = image.rows;
+  float scale =
+      std::min(INPUT_W / (image.cols * 1.0), INPUT_H / (image.rows * 1.0));
+  std::vector<Object> objects;
+  decode_outputs(predict_ptr, objects, scale, img_w, img_h);
+  draw_objects(image, objects);
+  return EXIT_SUCCESS;
+}

demo/MegEngine/python/README.md ADDED Viewed

	@@ -0,0 +1,33 @@

+# YOLOX-Python-MegEngine
+Python version of YOLOX object detection base on [MegEngine](https://github.com/MegEngine/MegEngine).
+## Tutorial
+### Step1: install requirements
+```
+python3 -m pip install megengine -f https://megengine.org.cn/whl/mge.html
+```
+### Step2: convert checkpoint weights from torch's path file
+```
+python3 convert_weights.py -w yolox_s.pth.tar -o yolox_s_mge.pkl
+```
+### Step3: run demo
+This part is the same as torch's python demo, but no need to specify device.
+```
+python3 demo.py image -n yolox-s -c yolox_s_mge.pkl --path ../../../assets/dog.jpg --conf 0.3 --nms 0.65 --tsize 640 --save_result
+```
+###  [Optional]Step4: dump model for cpp inference
+> **Note**: result model is dumped with `optimize_for_inference` and `enable_fuse_conv_bias_nonlinearity`.
+```
+python3 dump.py -n yolox-s -c yolox_s_mge.pkl --dump_path yolox_s.mge
+```

demo/MegEngine/python/build.py ADDED Viewed

	@@ -0,0 +1,54 @@

+#!/usr/bin/env python3
+# -*- coding:utf-8 -*-
+import megengine as mge
+import megengine.module as M
+from megengine import jit
+from models.yolo_fpn import YOLOFPN
+from models.yolo_head import YOLOXHead
+from models.yolo_pafpn import YOLOPAFPN
+from models.yolox import YOLOX
+def build_yolox(name="yolox-s"):
+    num_classes = 80
+    # value meaning: depth, width
+    param_dict = {
+        "yolox-nano": (0.33, 0.25),
+        "yolox-tiny": (0.33, 0.375),
+        "yolox-s": (0.33, 0.50),
+        "yolox-m": (0.67, 0.75),
+        "yolox-l": (1.0, 1.0),
+        "yolox-x": (1.33, 1.25),
+    }
+    if name == "yolov3":
+        depth = 1.0
+        width = 1.0
+        backbone = YOLOFPN()
+        head = YOLOXHead(num_classes, width, in_channels=[128, 256, 512], act="lrelu")
+        model = YOLOX(backbone, head)
+    else:
+        assert name in param_dict
+        kwargs = {}
+        depth, width = param_dict[name]
+        if name == "yolox-nano":
+            kwargs["depthwise"] = True
+        in_channels = [256, 512, 1024]
+        backbone = YOLOPAFPN(depth, width, in_channels=in_channels, **kwargs)
+        head = YOLOXHead(num_classes, width, in_channels=in_channels, **kwargs)
+        model = YOLOX(backbone, head)
+    for m in model.modules():
+        if isinstance(m, M.BatchNorm2d):
+            m.eps = 1e-3
+    return model
+def build_and_load(weight_file, name="yolox-s"):
+    model = build_yolox(name)
+    model_weights = mge.load(weight_file)
+    model.load_state_dict(model_weights, strict=False)
+    return model

demo/MegEngine/python/coco_classes.py ADDED Viewed

	@@ -0,0 +1,86 @@

+#!/usr/bin/env python3
+# -*- coding:utf-8 -*-
+# Copyright (c) Megvii, Inc. and its affiliates.
+COCO_CLASSES = (
+    "person",
+    "bicycle",
+    "car",
+    "motorcycle",
+    "airplane",
+    "bus",
+    "train",
+    "truck",
+    "boat",
+    "traffic light",
+    "fire hydrant",
+    "stop sign",
+    "parking meter",
+    "bench",
+    "bird",
+    "cat",
+    "dog",
+    "horse",
+    "sheep",
+    "cow",
+    "elephant",
+    "bear",
+    "zebra",
+    "giraffe",
+    "backpack",
+    "umbrella",
+    "handbag",
+    "tie",
+    "suitcase",
+    "frisbee",
+    "skis",
+    "snowboard",
+    "sports ball",
+    "kite",
+    "baseball bat",
+    "baseball glove",
+    "skateboard",
+    "surfboard",
+    "tennis racket",
+    "bottle",
+    "wine glass",
+    "cup",
+    "fork",
+    "knife",
+    "spoon",
+    "bowl",
+    "banana",
+    "apple",
+    "sandwich",
+    "orange",
+    "broccoli",
+    "carrot",
+    "hot dog",
+    "pizza",
+    "donut",
+    "cake",
+    "chair",
+    "couch",
+    "potted plant",
+    "bed",
+    "dining table",
+    "toilet",
+    "tv",
+    "laptop",
+    "mouse",
+    "remote",
+    "keyboard",
+    "cell phone",
+    "microwave",
+    "oven",
+    "toaster",
+    "sink",
+    "refrigerator",
+    "book",
+    "clock",
+    "vase",
+    "scissors",
+    "teddy bear",
+    "hair drier",
+    "toothbrush",
+)

demo/MegEngine/python/convert_weights.py ADDED Viewed

	@@ -0,0 +1,64 @@

+#!/usr/bin/env python3
+# -*- coding:utf-8 -*-
+import argparse
+from collections import OrderedDict
+import megengine as mge
+import torch
+def make_parser():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("-w", "--weights", type=str, help="path of weight file")
+    parser.add_argument(
+        "-o",
+        "--output",
+        default="weight_mge.pkl",
+        type=str,
+        help="path of weight file",
+    )
+    return parser
+def numpy_weights(weight_file):
+    torch_weights = torch.load(weight_file, map_location="cpu")
+    if "model" in torch_weights:
+        torch_weights = torch_weights["model"]
+    new_dict = OrderedDict()
+    for k, v in torch_weights.items():
+        new_dict[k] = v.cpu().numpy()
+    return new_dict
+def map_weights(weight_file, output_file):
+    torch_weights = numpy_weights(weight_file)
+    new_dict = OrderedDict()
+    for k, v in torch_weights.items():
+        if "num_batches_tracked" in k:
+            print("drop: {}".format(k))
+            continue
+        if k.endswith("bias"):
+            print("bias key: {}".format(k))
+            v = v.reshape(1, -1, 1, 1)
+            new_dict[k] = v
+        elif "dconv" in k and "conv.weight" in k:
+            print("depthwise conv key: {}".format(k))
+            cout, cin, k1, k2 = v.shape
+            v = v.reshape(cout, 1, cin, k1, k2)
+            new_dict[k] = v
+        else:
+            new_dict[k] = v
+    mge.save(new_dict, output_file)
+    print("save weights to {}".format(output_file))
+def main():
+    parser = make_parser()
+    args = parser.parse_args()
+    map_weights(args.weights, args.output)
+if __name__ == "__main__":
+    main()

demo/MegEngine/python/demo.py ADDED Viewed

	@@ -0,0 +1,202 @@

+#!/usr/bin/env python3
+# -*- coding:utf-8 -*-
+# Copyright (c) Megvii, Inc. and its affiliates.
+import argparse
+import os
+import time
+import cv2
+import megengine as mge
+import megengine.functional as F
+from loguru import logger
+from coco_classes import COCO_CLASSES
+from process import postprocess, preprocess
+from visualize import vis
+from build import build_and_load
+IMAGE_EXT = [".jpg", ".jpeg", ".webp", ".bmp", ".png"]
+def make_parser():
+    parser = argparse.ArgumentParser("YOLOX Demo!")
+    parser.add_argument(
+        "demo", default="image", help="demo type, eg. image, video and webcam"
+    )
+    parser.add_argument("-n", "--name", type=str, default="yolox-s", help="model name")
+    parser.add_argument("--path", default="./test.png", help="path to images or video")
+    parser.add_argument("--camid", type=int, default=0, help="webcam demo camera id")
+    parser.add_argument(
+        "--save_result",
+        action="store_true",
+        help="whether to save the inference result of image/video",
+    )
+    parser.add_argument("-c", "--ckpt", default=None, type=str, help="ckpt for eval")
+    parser.add_argument("--conf", default=None, type=float, help="test conf")
+    parser.add_argument("--nms", default=None, type=float, help="test nms threshold")
+    parser.add_argument("--tsize", default=None, type=int, help="test img size")
+    return parser
+def get_image_list(path):
+    image_names = []
+    for maindir, subdir, file_name_list in os.walk(path):
+        for filename in file_name_list:
+            apath = os.path.join(maindir, filename)
+            ext = os.path.splitext(apath)[1]
+            if ext in IMAGE_EXT:
+                image_names.append(apath)
+    return image_names
+class Predictor(object):
+    def __init__(
+        self,
+        model,
+        confthre=0.01,
+        nmsthre=0.65,
+        test_size=(640, 640),
+        cls_names=COCO_CLASSES,
+        trt_file=None,
+        decoder=None,
+    ):
+        self.model = model
+        self.cls_names = cls_names
+        self.decoder = decoder
+        self.num_classes = 80
+        self.confthre = confthre
+        self.nmsthre = nmsthre
+        self.test_size = test_size
+        self.rgb_means = (0.485, 0.456, 0.406)
+        self.std = (0.229, 0.224, 0.225)
+    def inference(self, img):
+        img_info = {"id": 0}
+        if isinstance(img, str):
+            img_info["file_name"] = os.path.basename(img)
+            img = cv2.imread(img)
+            if img is None:
+                raise ValueError("test image path is invalid!")
+        else:
+            img_info["file_name"] = None
+        height, width = img.shape[:2]
+        img_info["height"] = height
+        img_info["width"] = width
+        img_info["raw_img"] = img
+        img, ratio = preprocess(img, self.test_size, self.rgb_means, self.std)
+        img_info["ratio"] = ratio
+        img = F.expand_dims(mge.tensor(img), 0)
+        t0 = time.time()
+        outputs = self.model(img)
+        outputs = postprocess(outputs, self.num_classes, self.confthre, self.nmsthre)
+        logger.info("Infer time: {:.4f}s".format(time.time() - t0))
+        return outputs, img_info
+    def visual(self, output, img_info, cls_conf=0.35):
+        ratio = img_info["ratio"]
+        img = img_info["raw_img"]
+        if output is None:
+            return img
+        output = output.numpy()
+        # preprocessing: resize
+        bboxes = output[:, 0:4] / ratio
+        cls = output[:, 6]
+        scores = output[:, 4] * output[:, 5]
+        vis_res = vis(img, bboxes, scores, cls, cls_conf, self.cls_names)
+        return vis_res
+def image_demo(predictor, vis_folder, path, current_time, save_result):
+    if os.path.isdir(path):
+        files = get_image_list(path)
+    else:
+        files = [path]
+    files.sort()
+    for image_name in files:
+        outputs, img_info = predictor.inference(image_name)
+        result_image = predictor.visual(outputs[0], img_info)
+        if save_result:
+            save_folder = os.path.join(
+                vis_folder, time.strftime("%Y_%m_%d_%H_%M_%S", current_time)
+            )
+            os.makedirs(save_folder, exist_ok=True)
+            save_file_name = os.path.join(save_folder, os.path.basename(image_name))
+            logger.info("Saving detection result in {}".format(save_file_name))
+            cv2.imwrite(save_file_name, result_image)
+        ch = cv2.waitKey(0)
+        if ch == 27 or ch == ord("q") or ch == ord("Q"):
+            break
+def imageflow_demo(predictor, vis_folder, current_time, args):
+    cap = cv2.VideoCapture(args.path if args.demo == "video" else args.camid)
+    width = cap.get(cv2.CAP_PROP_FRAME_WIDTH)  # float
+    height = cap.get(cv2.CAP_PROP_FRAME_HEIGHT)  # float
+    fps = cap.get(cv2.CAP_PROP_FPS)
+    save_folder = os.path.join(
+        vis_folder, time.strftime("%Y_%m_%d_%H_%M_%S", current_time)
+    )
+    os.makedirs(save_folder, exist_ok=True)
+    if args.demo == "video":
+        save_path = os.path.join(save_folder, args.path.split("/")[-1])
+    else:
+        save_path = os.path.join(save_folder, "camera.mp4")
+    logger.info(f"video save_path is {save_path}")
+    vid_writer = cv2.VideoWriter(
+        save_path, cv2.VideoWriter_fourcc(*"mp4v"), fps, (int(width), int(height))
+    )
+    while True:
+        ret_val, frame = cap.read()
+        if ret_val:
+            outputs, img_info = predictor.inference(frame)
+            result_frame = predictor.visual(outputs[0], img_info)
+            if args.save_result:
+                vid_writer.write(result_frame)
+            ch = cv2.waitKey(1)
+            if ch == 27 or ch == ord("q") or ch == ord("Q"):
+                break
+        else:
+            break
+def main(args):
+    file_name = os.path.join("./yolox_outputs", args.name)
+    os.makedirs(file_name, exist_ok=True)
+    if args.save_result:
+        vis_folder = os.path.join(file_name, "vis_res")
+        os.makedirs(vis_folder, exist_ok=True)
+    confthre = 0.01
+    nmsthre = 0.65
+    test_size = (640, 640)
+    if args.conf is not None:
+        confthre = args.conf
+    if args.nms is not None:
+        nmsthre = args.nms
+    if args.tsize is not None:
+        test_size = (args.tsize, args.tsize)
+    model = build_and_load(args.ckpt, name=args.name)
+    model.eval()
+    predictor = Predictor(model, confthre, nmsthre, test_size, COCO_CLASSES, None, None)
+    current_time = time.localtime()
+    if args.demo == "image":
+        image_demo(predictor, vis_folder, args.path, current_time, args.save_result)
+    elif args.demo == "video" or args.demo == "webcam":
+        imageflow_demo(predictor, vis_folder, current_time, args)
+if __name__ == "__main__":
+    args = make_parser().parse_args()
+    main(args)

demo/MegEngine/python/dump.py ADDED Viewed

	@@ -0,0 +1,51 @@

+#!/usr/bin/env python3
+# -*- coding:utf-8 -*-
+# Copyright (c) Megvii, Inc. and its affiliates.
+import argparse
+import megengine as mge
+import numpy as np
+from megengine import jit
+from build import build_and_load
+def make_parser():
+    parser = argparse.ArgumentParser("YOLOX Demo Dump")
+    parser.add_argument("-n", "--name", type=str, default="yolox-s", help="model name")
+    parser.add_argument("-c", "--ckpt", default=None, type=str, help="ckpt for eval")
+    parser.add_argument(
+        "--dump_path", default="model.mge", help="path to save the dumped model"
+    )
+    return parser
+def dump_static_graph(model, graph_name="model.mge"):
+    model.eval()
+    model.head.decode_in_inference = False
+    data = mge.Tensor(np.random.random((1, 3, 640, 640)))
+    @jit.trace(capture_as_const=True)
+    def pred_func(data):
+        outputs = model(data)
+        return outputs
+    pred_func(data)
+    pred_func.dump(
+        graph_name,
+        arg_names=["data"],
+        optimize_for_inference=True,
+        enable_fuse_conv_bias_nonlinearity=True,
+    )
+def main(args):
+    model = build_and_load(args.ckpt, name=args.name)
+    dump_static_graph(model, args.dump_path)
+if __name__ == "__main__":
+    args = make_parser().parse_args()
+    main(args)

demo/MegEngine/python/models/__init__.py ADDED Viewed

	@@ -0,0 +1,9 @@

+#!/usr/bin/env python3
+# -*- coding:utf-8 -*-
+# Copyright (c) 2014-2021 Megvii Inc. All rights reserved.
+from .darknet import CSPDarknet, Darknet
+from .yolo_fpn import YOLOFPN
+from .yolo_head import YOLOXHead
+from .yolo_pafpn import YOLOPAFPN
+from .yolox import YOLOX

demo/MegEngine/python/models/darknet.py ADDED Viewed

	@@ -0,0 +1,154 @@

+#!/usr/bin/env python3
+# -*- encoding: utf-8 -*-
+# Copyright (c) 2014-2021 Megvii Inc. All rights reserved.
+import megengine.module as M
+from .network_blocks import BaseConv, CSPLayer, DWConv, Focus, ResLayer, SPPBottleneck
+class Darknet(M.Module):
+    # number of blocks from dark2 to dark5.
+    depth2blocks = {21: [1, 2, 2, 1], 53: [2, 8, 8, 4]}
+    def __init__(
+        self, depth, in_channels=3, stem_out_channels=32, out_features=("dark3", "dark4", "dark5"),
+    ):
+        """
+        Args:
+            depth (int): depth of darknet used in model, usually use [21, 53] for this param.
+            in_channels (int): number of input channels, for example, use 3 for RGB image.
+            stem_out_channels (int): number of output chanels of darknet stem.
+                It decides channels of darknet layer2 to layer5.
+            out_features (Tuple[str]): desired output layer name.
+        """
+        super().__init__()
+        assert out_features, "please provide output features of Darknet"
+        self.out_features = out_features
+        self.stem = M.Sequential(
+            BaseConv(in_channels, stem_out_channels, ksize=3, stride=1, act="lrelu"),
+            *self.make_group_layer(stem_out_channels, num_blocks=1, stride=2),
+        )
+        in_channels = stem_out_channels * 2  # 64
+        num_blocks = Darknet.depth2blocks[depth]
+        # create darknet with `stem_out_channels` and `num_blocks` layers.
+        # to make model structure more clear, we don't use `for` statement in python.
+        self.dark2 = M.Sequential(*self.make_group_layer(in_channels, num_blocks[0], stride=2))
+        in_channels *= 2  # 128
+        self.dark3 = M.Sequential(*self.make_group_layer(in_channels, num_blocks[1], stride=2))
+        in_channels *= 2  # 256
+        self.dark4 = M.Sequential(*self.make_group_layer(in_channels, num_blocks[2], stride=2))
+        in_channels *= 2  # 512
+        self.dark5 = M.Sequential(
+            *self.make_group_layer(in_channels, num_blocks[3], stride=2),
+            *self.make_spp_block([in_channels, in_channels * 2], in_channels * 2),
+        )
+    def make_group_layer(self, in_channels: int, num_blocks: int, stride: int = 1):
+        "starts with conv layer then has `num_blocks` `ResLayer`"
+        return [
+            BaseConv(in_channels, in_channels * 2, ksize=3, stride=stride, act="lrelu"),
+            *[(ResLayer(in_channels * 2)) for _ in range(num_blocks)]
+        ]
+    def make_spp_block(self, filters_list, in_filters):
+        m = M.Sequential(
+            *[
+                BaseConv(in_filters, filters_list[0], 1, stride=1, act="lrelu"),
+                BaseConv(filters_list[0], filters_list[1], 3, stride=1, act="lrelu"),
+                SPPBottleneck(
+                    in_channels=filters_list[1],
+                    out_channels=filters_list[0],
+                    activation="lrelu"
+                ),
+                BaseConv(filters_list[0], filters_list[1], 3, stride=1, act="lrelu"),
+                BaseConv(filters_list[1], filters_list[0], 1, stride=1, act="lrelu"),
+            ]
+        )
+        return m
+    def forward(self, x):
+        outputs = {}
+        x = self.stem(x)
+        outputs["stem"] = x
+        x = self.dark2(x)
+        outputs["dark2"] = x
+        x = self.dark3(x)
+        outputs["dark3"] = x
+        x = self.dark4(x)
+        outputs["dark4"] = x
+        x = self.dark5(x)
+        outputs["dark5"] = x
+        return {k: v for k, v in outputs.items() if k in self.out_features}
+class CSPDarknet(M.Module):
+    def __init__(
+        self, dep_mul, wid_mul,
+        out_features=("dark3", "dark4", "dark5"),
+        depthwise=False, act="silu",
+    ):
+        super().__init__()
+        assert out_features, "please provide output features of Darknet"
+        self.out_features = out_features
+        Conv = DWConv if depthwise else BaseConv
+        base_channels = int(wid_mul * 64)  # 64
+        base_depth = max(round(dep_mul * 3), 1)  # 3
+        # stem
+        self.stem = Focus(3, base_channels, ksize=3, act=act)
+        # dark2
+        self.dark2 = M.Sequential(
+            Conv(base_channels, base_channels * 2, 3, 2, act=act),
+            CSPLayer(
+                base_channels * 2, base_channels * 2,
+                n=base_depth, depthwise=depthwise, act=act
+            ),
+        )
+        # dark3
+        self.dark3 = M.Sequential(
+            Conv(base_channels * 2, base_channels * 4, 3, 2, act=act),
+            CSPLayer(
+                base_channels * 4, base_channels * 4,
+                n=base_depth * 3, depthwise=depthwise, act=act,
+            ),
+        )
+        # dark4
+        self.dark4 = M.Sequential(
+            Conv(base_channels * 4, base_channels * 8, 3, 2, act=act),
+            CSPLayer(
+                base_channels * 8, base_channels * 8,
+                n=base_depth * 3, depthwise=depthwise, act=act,
+            ),
+        )
+        # dark5
+        self.dark5 = M.Sequential(
+            Conv(base_channels * 8, base_channels * 16, 3, 2, act=act),
+            SPPBottleneck(base_channels * 16, base_channels * 16, activation=act),
+            CSPLayer(
+                base_channels * 16, base_channels * 16, n=base_depth,
+                shortcut=False, depthwise=depthwise, act=act,
+            ),
+        )
+    def forward(self, x):
+        outputs = {}
+        x = self.stem(x)
+        outputs["stem"] = x
+        x = self.dark2(x)
+        outputs["dark2"] = x
+        x = self.dark3(x)
+        outputs["dark3"] = x
+        x = self.dark4(x)
+        outputs["dark4"] = x
+        x = self.dark5(x)
+        outputs["dark5"] = x
+        return {k: v for k, v in outputs.items() if k in self.out_features}

demo/MegEngine/python/models/network_blocks.py ADDED Viewed

	@@ -0,0 +1,183 @@

+#!/usr/bin/env python3
+# -*- encoding: utf-8 -*-
+# Copyright (c) 2014-2021 Megvii Inc. All rights reserved.
+import megengine.functional as F
+import megengine.module as M
+class UpSample(M.Module):
+    def __init__(self, scale_factor=2, mode="bilinear"):
+        super().__init__()
+        self.scale_factor = scale_factor
+        self.mode = mode
+    def forward(self, x):
+        return F.vision.interpolate(x, scale_factor=self.scale_factor, mode=self.mode)
+class SiLU(M.Module):
+    """export-friendly version of M.SiLU()"""
+    @staticmethod
+    def forward(x):
+        return x * F.sigmoid(x)
+def get_activation(name="silu"):
+    if name == "silu":
+        module = SiLU()
+    elif name == "relu":
+        module = M.ReLU()
+    elif name == "lrelu":
+        module = M.LeakyReLU(0.1)
+    else:
+        raise AttributeError("Unsupported act type: {}".format(name))
+    return module
+class BaseConv(M.Module):
+    """A Conv2d -> Batchnorm -> silu/leaky relu block"""
+    def __init__(self, in_channels, out_channels, ksize, stride, groups=1, bias=False, act="silu"):
+        super().__init__()
+        # same padding
+        pad = (ksize - 1) // 2
+        self.conv = M.Conv2d(
+            in_channels,
+            out_channels,
+            kernel_size=ksize,
+            stride=stride,
+            padding=pad,
+            groups=groups,
+            bias=bias,
+        )
+        self.bn = M.BatchNorm2d(out_channels)
+        self.act = get_activation(act)
+    def forward(self, x):
+        return self.act(self.bn(self.conv(x)))
+    def fuseforward(self, x):
+        return self.act(self.conv(x))
+class DWConv(M.Module):
+    """Depthwise Conv + Conv"""
+    def __init__(self, in_channels, out_channels, ksize, stride=1, act="silu"):
+        super().__init__()
+        self.dconv = BaseConv(
+            in_channels, in_channels, ksize=ksize,
+            stride=stride, groups=in_channels, act=act
+        )
+        self.pconv = BaseConv(
+            in_channels, out_channels, ksize=1,
+            stride=1, groups=1, act=act
+        )
+    def forward(self, x):
+        x = self.dconv(x)
+        return self.pconv(x)
+class Bottleneck(M.Module):
+    # Standard bottleneck
+    def __init__(
+        self, in_channels, out_channels, shortcut=True,
+        expansion=0.5, depthwise=False, act="silu"
+    ):
+        super().__init__()
+        hidden_channels = int(out_channels * expansion)
+        Conv = DWConv if depthwise else BaseConv
+        self.conv1 = BaseConv(in_channels, hidden_channels, 1, stride=1, act=act)
+        self.conv2 = Conv(hidden_channels, out_channels, 3, stride=1, act=act)
+        self.use_add = shortcut and in_channels == out_channels
+    def forward(self, x):
+        y = self.conv2(self.conv1(x))
+        if self.use_add:
+            y = y + x
+        return y
+class ResLayer(M.Module):
+    "Residual layer with `in_channels` inputs."
+    def __init__(self, in_channels: int):
+        super().__init__()
+        mid_channels = in_channels // 2
+        self.layer1 = BaseConv(in_channels, mid_channels, ksize=1, stride=1, act="lrelu")
+        self.layer2 = BaseConv(mid_channels, in_channels, ksize=3, stride=1, act="lrelu")
+    def forward(self, x):
+        out = self.layer2(self.layer1(x))
+        return x + out
+class SPPBottleneck(M.Module):
+    """Spatial pyramid pooling layer used in YOLOv3-SPP"""
+    def __init__(self, in_channels, out_channels, kernel_sizes=(5, 9, 13), activation="silu"):
+        super().__init__()
+        hidden_channels = in_channels // 2
+        self.conv1 = BaseConv(in_channels, hidden_channels, 1, stride=1, act=activation)
+        self.m = [M.MaxPool2d(kernel_size=ks, stride=1, padding=ks // 2) for ks in kernel_sizes]
+        conv2_channels = hidden_channels * (len(kernel_sizes) + 1)
+        self.conv2 = BaseConv(conv2_channels, out_channels, 1, stride=1, act=activation)
+    def forward(self, x):
+        x = self.conv1(x)
+        x = F.concat([x] + [m(x) for m in self.m], axis=1)
+        x = self.conv2(x)
+        return x
+class CSPLayer(M.Module):
+    """C3 in yolov5, CSP Bottleneck with 3 convolutions"""
+    def __init__(
+        self, in_channels, out_channels, n=1,
+        shortcut=True, expansion=0.5, depthwise=False, act="silu"
+    ):
+        """
+        Args:
+            in_channels (int): input channels.
+            out_channels (int): output channels.
+            n (int): number of Bottlenecks. Default value: 1.
+        """
+        # ch_in, ch_out, number, shortcut, groups, expansion
+        super().__init__()
+        hidden_channels = int(out_channels * expansion)  # hidden channels
+        self.conv1 = BaseConv(in_channels, hidden_channels, 1, stride=1, act=act)
+        self.conv2 = BaseConv(in_channels, hidden_channels, 1, stride=1, act=act)
+        self.conv3 = BaseConv(2 * hidden_channels, out_channels, 1, stride=1, act=act)
+        module_list = [
+            Bottleneck(hidden_channels, hidden_channels, shortcut, 1.0, depthwise, act=act)
+            for _ in range(n)
+        ]
+        self.m = M.Sequential(*module_list)
+    def forward(self, x):
+        x_1 = self.conv1(x)
+        x_2 = self.conv2(x)
+        x_1 = self.m(x_1)
+        x = F.concat((x_1, x_2), axis=1)
+        return self.conv3(x)
+class Focus(M.Module):
+    """Focus width and height information into channel space."""
+    def __init__(self, in_channels, out_channels, ksize=1, stride=1, act="silu"):
+        super().__init__()
+        self.conv = BaseConv(in_channels * 4, out_channels, ksize, stride, act=act)
+    def forward(self, x):
+        # shape of x (b,c,w,h) -> y(b,4c,w/2,h/2)
+        patch_top_left = x[..., ::2, ::2]
+        patch_top_right = x[..., ::2, 1::2]
+        patch_bot_left = x[..., 1::2, ::2]
+        patch_bot_right = x[..., 1::2, 1::2]
+        x = F.concat(
+            (patch_top_left, patch_bot_left, patch_top_right, patch_bot_right,), axis=1,
+        )
+        return self.conv(x)

demo/MegEngine/python/models/yolo_fpn.py ADDED Viewed

	@@ -0,0 +1,78 @@

+#!/usr/bin/env python3
+# -*- encoding: utf-8 -*-
+# Copyright (c) 2014-2021 Megvii Inc. All rights reserved.
+import megengine.functional as F
+import megengine.module as M
+from .darknet import Darknet
+from .network_blocks import BaseConv, UpSample
+class YOLOFPN(M.Module):
+    """
+    YOLOFPN module. Darknet 53 is the default backbone of this model.
+    """
+    def __init__(
+        self, depth=53, in_features=["dark3", "dark4", "dark5"],
+    ):
+        super().__init__()
+        self.backbone = Darknet(depth)
+        self.in_features = in_features
+        # out 1
+        self.out1_cbl = self._make_cbl(512, 256, 1)
+        self.out1 = self._make_embedding([256, 512], 512 + 256)
+        # out 2
+        self.out2_cbl = self._make_cbl(256, 128, 1)
+        self.out2 = self._make_embedding([128, 256], 256 + 128)
+        # upsample
+        self.upsample = UpSample(scale_factor=2, mode="bilinear")
+    def _make_cbl(self, _in, _out, ks):
+        return BaseConv(_in, _out, ks, stride=1, act="lrelu")
+    def _make_embedding(self, filters_list, in_filters):
+        m = M.Sequential(
+            *[
+                self._make_cbl(in_filters, filters_list[0], 1),
+                self._make_cbl(filters_list[0], filters_list[1], 3),
+                self._make_cbl(filters_list[1], filters_list[0], 1),
+                self._make_cbl(filters_list[0], filters_list[1], 3),
+                self._make_cbl(filters_list[1], filters_list[0], 1),
+            ]
+        )
+        return m
+    def forward(self, inputs):
+        """
+        Args:
+            inputs (Tensor): input image.
+        Returns:
+            Tuple[Tensor]: FPN output features..
+        """
+        #  backbone
+        out_features = self.backbone(inputs)
+        x2, x1, x0 = [out_features[f] for f in self.in_features]
+        #  yolo branch 1
+        x1_in = self.out1_cbl(x0)
+        x1_in = self.upsample(x1_in)
+        x1_in = F.concat([x1_in, x1], 1)
+        out_dark4 = self.out1(x1_in)
+        #  yolo branch 2
+        x2_in = self.out2_cbl(out_dark4)
+        x2_in = self.upsample(x2_in)
+        x2_in = F.concat([x2_in, x2], 1)
+        out_dark3 = self.out2(x2_in)
+        outputs = (out_dark3, out_dark4, x0)
+        return outputs

demo/MegEngine/python/models/yolo_head.py ADDED Viewed

	@@ -0,0 +1,193 @@

+#!/usr/bin/env python3
+# -*- coding:utf-8 -*-
+# Copyright (c) 2014-2021 Megvii Inc. All rights reserved.
+import megengine.functional as F
+import megengine.module as M
+from .network_blocks import BaseConv, DWConv
+def meshgrid(x, y):
+    """meshgrid wrapper for megengine"""
+    assert len(x.shape) == 1
+    assert len(y.shape) == 1
+    mesh_shape = (y.shape[0], x.shape[0])
+    mesh_x = F.broadcast_to(x, mesh_shape)
+    mesh_y = F.broadcast_to(y.reshape(-1, 1), mesh_shape)
+    return mesh_x, mesh_y
+class YOLOXHead(M.Module):
+    def __init__(
+        self, num_classes, width=1.0, strides=[8, 16, 32],
+        in_channels=[256, 512, 1024], act="silu", depthwise=False
+    ):
+        """
+        Args:
+            act (str): activation type of conv. Defalut value: "silu".
+            depthwise (bool): wheather apply depthwise conv in conv branch. Defalut value: False.
+        """
+        super().__init__()
+        self.n_anchors = 1
+        self.num_classes = num_classes
+        self.decode_in_inference = True  # save for matching
+        self.cls_convs = []
+        self.reg_convs = []
+        self.cls_preds = []
+        self.reg_preds = []
+        self.obj_preds = []
+        self.stems = []
+        Conv = DWConv if depthwise else BaseConv
+        for i in range(len(in_channels)):
+            self.stems.append(
+                BaseConv(
+                    in_channels=int(in_channels[i] * width),
+                    out_channels=int(256 * width),
+                    ksize=1,
+                    stride=1,
+                    act=act,
+                )
+            )
+            self.cls_convs.append(
+                M.Sequential(
+                    *[
+                        Conv(
+                            in_channels=int(256 * width),
+                            out_channels=int(256 * width),
+                            ksize=3,
+                            stride=1,
+                            act=act,
+                        ),
+                        Conv(
+                            in_channels=int(256 * width),
+                            out_channels=int(256 * width),
+                            ksize=3,
+                            stride=1,
+                            act=act,
+                        ),
+                    ]
+                )
+            )
+            self.reg_convs.append(
+                M.Sequential(
+                    *[
+                        Conv(
+                            in_channels=int(256 * width),
+                            out_channels=int(256 * width),
+                            ksize=3,
+                            stride=1,
+                            act=act,
+                        ),
+                        Conv(
+                            in_channels=int(256 * width),
+                            out_channels=int(256 * width),
+                            ksize=3,
+                            stride=1,
+                            act=act,
+                        ),
+                    ]
+                )
+            )
+            self.cls_preds.append(
+                M.Conv2d(
+                    in_channels=int(256 * width),
+                    out_channels=self.n_anchors * self.num_classes,
+                    kernel_size=1,
+                    stride=1,
+                    padding=0,
+                )
+            )
+            self.reg_preds.append(
+                M.Conv2d(
+                    in_channels=int(256 * width),
+                    out_channels=4,
+                    kernel_size=1,
+                    stride=1,
+                    padding=0,
+                )
+            )
+            self.obj_preds.append(
+                M.Conv2d(
+                    in_channels=int(256 * width),
+                    out_channels=self.n_anchors * 1,
+                    kernel_size=1,
+                    stride=1,
+                    padding=0,
+                )
+            )
+        self.use_l1 = False
+        self.strides = strides
+        self.grids = [F.zeros(1)] * len(in_channels)
+        self.expanded_strides = [None] * len(in_channels)
+    def forward(self, xin, labels=None, imgs=None):
+        outputs = []
+        assert not self.training
+        for k, (cls_conv, reg_conv, stride_this_level, x) in enumerate(
+            zip(self.cls_convs, self.reg_convs, self.strides, xin)
+        ):
+            x = self.stems[k](x)
+            cls_x = x
+            reg_x = x
+            cls_feat = cls_conv(cls_x)
+            cls_output = self.cls_preds[k](cls_feat)
+            reg_feat = reg_conv(reg_x)
+            reg_output = self.reg_preds[k](reg_feat)
+            obj_output = self.obj_preds[k](reg_feat)
+            output = F.concat([reg_output, F.sigmoid(obj_output), F.sigmoid(cls_output)], 1)
+            outputs.append(output)
+        self.hw = [x.shape[-2:] for x in outputs]
+        # [batch, n_anchors_all, 85]
+        outputs = F.concat([F.flatten(x, start_axis=2) for x in outputs], axis=2)
+        outputs = F.transpose(outputs, (0, 2, 1))
+        if self.decode_in_inference:
+            return self.decode_outputs(outputs)
+        else:
+            return outputs
+    def get_output_and_grid(self, output, k, stride, dtype):
+        grid = self.grids[k]
+        batch_size = output.shape[0]
+        n_ch = 5 + self.num_classes
+        hsize, wsize = output.shape[-2:]
+        if grid.shape[2:4] != output.shape[2:4]:
+            yv, xv = meshgrid([F.arange(hsize), F.arange(wsize)])
+            grid = F.stack((xv, yv), 2).reshape(1, 1, hsize, wsize, 2).type(dtype)
+            self.grids[k] = grid
+        output = output.view(batch_size, self.n_anchors, n_ch, hsize, wsize)
+        output = (
+            output.permute(0, 1, 3, 4, 2)
+            .reshape(batch_size, self.n_anchors * hsize * wsize, -1)
+        )
+        grid = grid.view(1, -1, 2)
+        output[..., :2] = (output[..., :2] + grid) * stride
+        output[..., 2:4] = F.exp(output[..., 2:4]) * stride
+        return output, grid
+    def decode_outputs(self, outputs):
+        grids = []
+        strides = []
+        for (hsize, wsize), stride in zip(self.hw, self.strides):
+            xv, yv = meshgrid(F.arange(hsize), F.arange(wsize))
+            grid = F.stack((xv, yv), 2).reshape(1, -1, 2)
+            grids.append(grid)
+            shape = grid.shape[:2]
+            strides.append(F.full((*shape, 1), stride))
+        grids = F.concat(grids, axis=1)
+        strides = F.concat(strides, axis=1)
+        outputs[..., :2] = (outputs[..., :2] + grids) * strides
+        outputs[..., 2:4] = F.exp(outputs[..., 2:4]) * strides
+        return outputs

demo/MegEngine/python/models/yolo_pafpn.py ADDED Viewed

	@@ -0,0 +1,111 @@

+#!/usr/bin/env python3
+# -*- encoding: utf-8 -*-
+# Copyright (c) 2014-2021 Megvii Inc. All rights reserved.
+import megengine.module as M
+import megengine.functional as F
+from .darknet import CSPDarknet
+from .network_blocks import BaseConv, CSPLayer, DWConv, UpSample
+class YOLOPAFPN(M.Module):
+    """
+    YOLOv3 model. Darknet 53 is the default backbone of this model.
+    """
+    def __init__(
+        self, depth=1.0, width=1.0, in_features=("dark3", "dark4", "dark5"),
+        in_channels=[256, 512, 1024], depthwise=False, act="silu",
+    ):
+        super().__init__()
+        self.backbone = CSPDarknet(depth, width, depthwise=depthwise, act=act)
+        self.in_features = in_features
+        self.in_channels = in_channels
+        Conv = DWConv if depthwise else BaseConv
+        self.upsample = UpSample(scale_factor=2, mode="bilinear")
+        self.lateral_conv0 = BaseConv(
+            int(in_channels[2] * width), int(in_channels[1] * width), 1, 1, act=act
+        )
+        self.C3_p4 = CSPLayer(
+            int(2 * in_channels[1] * width),
+            int(in_channels[1] * width),
+            round(3 * depth),
+            False,
+            depthwise=depthwise,
+            act=act,
+        )  # cat
+        self.reduce_conv1 = BaseConv(
+            int(in_channels[1] * width), int(in_channels[0] * width), 1, 1, act=act
+        )
+        self.C3_p3 = CSPLayer(
+            int(2 * in_channels[0] * width),
+            int(in_channels[0] * width),
+            round(3 * depth),
+            False,
+            depthwise=depthwise,
+            act=act,
+        )
+        # bottom-up conv
+        self.bu_conv2 = Conv(
+            int(in_channels[0] * width), int(in_channels[0] * width), 3, 2, act=act
+        )
+        self.C3_n3 = CSPLayer(
+            int(2 * in_channels[0] * width),
+            int(in_channels[1] * width),
+            round(3 * depth),
+            False,
+            depthwise=depthwise,
+            act=act,
+        )
+        # bottom-up conv
+        self.bu_conv1 = Conv(
+            int(in_channels[1] * width), int(in_channels[1] * width), 3, 2, act=act
+        )
+        self.C3_n4 = CSPLayer(
+            int(2 * in_channels[1] * width),
+            int(in_channels[2] * width),
+            round(3 * depth),
+            False,
+            depthwise=depthwise,
+            act=act,
+        )
+    def forward(self, input):
+        """
+        Args:
+            inputs: input images.
+        Returns:
+            Tuple[Tensor]: FPN feature.
+        """
+        #  backbone
+        out_features = self.backbone(input)
+        features = [out_features[f] for f in self.in_features]
+        [x2, x1, x0] = features
+        fpn_out0 = self.lateral_conv0(x0)  # 1024->512/32
+        f_out0 = self.upsample(fpn_out0)  # 512/16
+        f_out0 = F.concat([f_out0, x1], 1)  # 512->1024/16
+        f_out0 = self.C3_p4(f_out0)  # 1024->512/16
+        fpn_out1 = self.reduce_conv1(f_out0)  # 512->256/16
+        f_out1 = self.upsample(fpn_out1)  # 256/8
+        f_out1 = F.concat([f_out1, x2], 1)  # 256->512/8
+        pan_out2 = self.C3_p3(f_out1)  # 512->256/8
+        p_out1 = self.bu_conv2(pan_out2)  # 256->256/16
+        p_out1 = F.concat([p_out1, fpn_out1], 1)  # 256->512/16
+        pan_out1 = self.C3_n3(p_out1)  # 512->512/16
+        p_out0 = self.bu_conv1(pan_out1)  # 512->512/32
+        p_out0 = F.concat([p_out0, fpn_out0], 1)  # 512->1024/32
+        pan_out0 = self.C3_n4(p_out0)  # 1024->1024/32
+        outputs = (pan_out2, pan_out1, pan_out0)
+        return outputs

demo/MegEngine/python/models/yolox.py ADDED Viewed

	@@ -0,0 +1,34 @@

+#!/usr/bin/env python3
+# -*- encoding: utf-8 -*-
+# Copyright (c) 2014-2021 Megvii Inc. All rights reserved.
+import megengine.module as M
+from .yolo_head import YOLOXHead
+from .yolo_pafpn import YOLOPAFPN
+class YOLOX(M.Module):
+    """
+    YOLOX model module. The module list is defined by create_yolov3_modules function.
+    The network returns loss values from three YOLO layers during training
+    and detection results during test.
+    """
+    def __init__(self, backbone=None, head=None):
+        super().__init__()
+        if backbone is None:
+            backbone = YOLOPAFPN()
+        if head is None:
+            head = YOLOXHead(80)
+        self.backbone = backbone
+        self.head = head
+    def forward(self, x):
+        # fpn output content features of [dark3, dark4, dark5]
+        fpn_outs = self.backbone(x)
+        assert not self.training
+        outputs = self.head(fpn_outs)
+        return outputs

demo/MegEngine/python/process.py ADDED Viewed

	@@ -0,0 +1,76 @@

+#!/usr/bin/env python3
+# -*- coding:utf-8 -*-
+# Copyright (c) 2014-2021 Megvii Inc. All rights reserved.
+import cv2
+import megengine.functional as F
+import numpy as np
+__all__ = [
+    "preprocess",
+    "postprocess",
+]
+def preprocess(image, input_size, mean, std, swap=(2, 0, 1)):
+    if len(image.shape) == 3:
+        padded_img = np.ones((input_size[0], input_size[1], 3)) * 114.0
+    else:
+        padded_img = np.ones(input_size) * 114.0
+    img = np.array(image)
+    r = min(input_size[0] / img.shape[0], input_size[1] / img.shape[1])
+    resized_img = cv2.resize(
+        img,
+        (int(img.shape[1] * r), int(img.shape[0] * r)),
+        interpolation=cv2.INTER_LINEAR,
+    ).astype(np.float32)
+    padded_img[: int(img.shape[0] * r), : int(img.shape[1] * r)] = resized_img
+    image = padded_img
+    image = image.astype(np.float32)
+    image = image[:, :, ::-1]
+    image /= 255.0
+    if mean is not None:
+        image -= mean
+    if std is not None:
+        image /= std
+    image = image.transpose(swap)
+    image = np.ascontiguousarray(image, dtype=np.float32)
+    return image, r
+def postprocess(prediction, num_classes, conf_thre=0.7, nms_thre=0.45):
+    box_corner = F.zeros_like(prediction)
+    box_corner[:, :, 0] = prediction[:, :, 0] - prediction[:, :, 2] / 2
+    box_corner[:, :, 1] = prediction[:, :, 1] - prediction[:, :, 3] / 2
+    box_corner[:, :, 2] = prediction[:, :, 0] + prediction[:, :, 2] / 2
+    box_corner[:, :, 3] = prediction[:, :, 1] + prediction[:, :, 3] / 2
+    prediction[:, :, :4] = box_corner[:, :, :4]
+    output = [None for _ in range(len(prediction))]
+    for i, image_pred in enumerate(prediction):
+        # If none are remaining => process next image
+        if not image_pred.shape[0]:
+            continue
+        # Get score and class with highest confidence
+        class_conf = F.max(image_pred[:, 5 : 5 + num_classes], 1, keepdims=True)
+        class_pred = F.argmax(image_pred[:, 5 : 5 + num_classes], 1, keepdims=True)
+        class_conf_squeeze = F.squeeze(class_conf)
+        conf_mask = image_pred[:, 4] * class_conf_squeeze >= conf_thre
+        detections = F.concat((image_pred[:, :5], class_conf, class_pred), 1)
+        detections = detections[conf_mask]
+        if not detections.shape[0]:
+            continue
+        nms_out_index = F.vision.nms(
+            detections[:, :4], detections[:, 4] * detections[:, 5], nms_thre,
+        )
+        detections = detections[nms_out_index]
+        if output[i] is None:
+            output[i] = detections
+        else:
+            output[i] = F.concat((output[i], detections))
+    return output

demo/MegEngine/python/visualize.py ADDED Viewed

	@@ -0,0 +1,128 @@

+#!/usr/bin/env python3
+# -*- coding:utf-8 -*-
+# Copyright (c) 2014-2021 Megvii Inc. All rights reserved.
+import cv2
+import numpy as np
+__all__ = ["vis"]
+def vis(img, boxes, scores, cls_ids, conf=0.5, class_names=None):
+    for i in range(len(boxes)):
+        box = boxes[i]
+        cls_id = int(cls_ids[i])
+        score = scores[i]
+        if score < conf:
+            continue
+        x0 = int(box[0])
+        y0 = int(box[1])
+        x1 = int(box[2])
+        y1 = int(box[3])
+        color = (_COLORS[cls_id] * 255).astype(np.uint8).tolist()
+        text = '{}:{:.1f}%'.format(class_names[cls_id], score * 100)
+        txt_color = (0, 0, 0) if np.mean(_COLORS[cls_id]) > 0.5 else (255, 255, 255)
+        font = cv2.FONT_HERSHEY_SIMPLEX
+        txt_size = cv2.getTextSize(text, font, 0.4, 1)[0]
+        cv2.rectangle(img, (x0, y0), (x1, y1), color, 2)
+        txt_bk_color = (_COLORS[cls_id] * 255 * 0.7).astype(np.uint8).tolist()
+        cv2.rectangle(
+            img,
+            (x0, y0 + 1),
+            (x0 + txt_size[0] + 1, y0 + int(1.5*txt_size[1])),
+            txt_bk_color,
+            -1
+        )
+        cv2.putText(img, text, (x0, y0 + txt_size[1]), font, 0.4, txt_color, thickness=1)
+    return img
+_COLORS = np.array(
+    [
+        0.000, 0.447, 0.741,
+        0.850, 0.325, 0.098,
+        0.929, 0.694, 0.125,
+        0.494, 0.184, 0.556,
+        0.466, 0.674, 0.188,
+        0.301, 0.745, 0.933,
+        0.635, 0.078, 0.184,
+        0.300, 0.300, 0.300,
+        0.600, 0.600, 0.600,
+        1.000, 0.000, 0.000,
+        1.000, 0.500, 0.000,
+        0.749, 0.749, 0.000,
+        0.000, 1.000, 0.000,
+        0.000, 0.000, 1.000,
+        0.667, 0.000, 1.000,
+        0.333, 0.333, 0.000,
+        0.333, 0.667, 0.000,
+        0.333, 1.000, 0.000,
+        0.667, 0.333, 0.000,
+        0.667, 0.667, 0.000,
+        0.667, 1.000, 0.000,
+        1.000, 0.333, 0.000,
+        1.000, 0.667, 0.000,
+        1.000, 1.000, 0.000,
+        0.000, 0.333, 0.500,
+        0.000, 0.667, 0.500,
+        0.000, 1.000, 0.500,
+        0.333, 0.000, 0.500,
+        0.333, 0.333, 0.500,
+        0.333, 0.667, 0.500,
+        0.333, 1.000, 0.500,
+        0.667, 0.000, 0.500,
+        0.667, 0.333, 0.500,
+        0.667, 0.667, 0.500,
+        0.667, 1.000, 0.500,
+        1.000, 0.000, 0.500,
+        1.000, 0.333, 0.500,
+        1.000, 0.667, 0.500,
+        1.000, 1.000, 0.500,
+        0.000, 0.333, 1.000,
+        0.000, 0.667, 1.000,
+        0.000, 1.000, 1.000,
+        0.333, 0.000, 1.000,
+        0.333, 0.333, 1.000,
+        0.333, 0.667, 1.000,
+        0.333, 1.000, 1.000,
+        0.667, 0.000, 1.000,
+        0.667, 0.333, 1.000,
+        0.667, 0.667, 1.000,
+        0.667, 1.000, 1.000,
+        1.000, 0.000, 1.000,
+        1.000, 0.333, 1.000,
+        1.000, 0.667, 1.000,
+        0.333, 0.000, 0.000,
+        0.500, 0.000, 0.000,
+        0.667, 0.000, 0.000,
+        0.833, 0.000, 0.000,
+        1.000, 0.000, 0.000,
+        0.000, 0.167, 0.000,
+        0.000, 0.333, 0.000,
+        0.000, 0.500, 0.000,
+        0.000, 0.667, 0.000,
+        0.000, 0.833, 0.000,
+        0.000, 1.000, 0.000,
+        0.000, 0.000, 0.167,
+        0.000, 0.000, 0.333,
+        0.000, 0.000, 0.500,
+        0.000, 0.000, 0.667,
+        0.000, 0.000, 0.833,
+        0.000, 0.000, 1.000,
+        0.000, 0.000, 0.000,
+        0.143, 0.143, 0.143,
+        0.286, 0.286, 0.286,
+        0.429, 0.429, 0.429,
+        0.571, 0.571, 0.571,
+        0.714, 0.714, 0.714,
+        0.857, 0.857, 0.857,
+        0.000, 0.447, 0.741,
+        0.314, 0.717, 0.741,
+        0.50, 0.5, 0
+    ]
+).astype(np.float32).reshape(-1, 3)