Spaces:

tidalove
/

yolox

Sleeping

App Files Files Community

王枫02-Base Detection commited on Jul 19, 2021

Commit

5b8ab8f

1 Parent(s): e9faa7e

feat(YOLOX): update README and fix serveral bugs.

Browse files

Files changed (22) hide show

README.md +67 -47
demo/ONNXRuntime/README.md +14 -13
demo/OpenVINO/README.md +3 -3
demo/OpenVINO/cpp/README.md +11 -11
demo/OpenVINO/python/README.md +12 -12
demo/TensorRT/cpp/README.md +2 -2
demo/TensorRT/python/README.md +5 -5
docs/.gitkeep +0 -0
docs/train_custom_data.md +118 -0
exps/example/yolox_voc/yolox_voc_s.py +124 -0
requirements.txt +3 -0
tools/demo.py +4 -9
yolox/data/datasets/coco.py +23 -13
yolox/data/datasets/mosaicdetection.py +1 -4
yolox/data/datasets/voc.py +24 -69
yolox/{evalutors → evaluators}/__init__.py +0 -0
yolox/{evalutors → evaluators}/coco_evaluator.py +0 -0
yolox/{evalutors → evaluators}/voc_eval.py +0 -0
yolox/evaluators/voc_evaluator.py +183 -0
yolox/evalutors/voc_evaluator.py +0 -202
yolox/models/yolo_head.py +8 -1
yolox/utils/visualize.py +2 -2

README.md CHANGED Viewed

@@ -1,37 +1,35 @@
-<div align="center"><img src="assets/logo.png" width="600"></div>
 <img src="assets/demo.png" >
-## <div align="center">Introduction</div>
 YOLOX is an anchor-free version of YOLO, with a simpler design but better performance! It aims to bridge the gap between research and industrial communities.
-## <div align="center">Why YOLOX?</div>
-<div align="center"><img src="assets/fig1.png" width="400" ><img src="assets/fig2.png" width="400"></div>
-## <div align="center">News!!</div>
-* 【2020/07/19】 We have released our technical report on [Arxiv](xxx)!!
-## <div align="center">Benchmark</div>
-### Standard Models.
 |Model |size |mAP<sup>test<br>0.5:0.95 | Speed V100<br>(ms) | Params<br>(M) |FLOPs<br>(B)| weights |
 | ------        |:---: | :---:       |:---:     |:---:  | :---: | :----: |
-|[YOLOX-s]()    |640  |39.6      |9.8     |9.0 | 26.8 | - |
-|[YOLOX-m]()    |640  |46.4      |12.3     |25.3 |73.8| - |
-|[YOLOX-l]()    |640  |50.0  |14.5 |54.2| 155.6 | - |
-|[YOLOX-x]()   |640  |**51.2**      | 17.3 |99.1 |281.9 | - |
-### Light Models.
-|Model |size |mAP<sup>val<br>0.5:0.95 |  Speed V100<br>(ms) | Params<br>(M) |FLOPs<br>(B)| weights |
-| ------        |:---:  |  :---:       |:---:     |:---:  | :---: | :----: |
-|[YOLOX-Nano]() |416  |25.3  |- | 0.91 |1.08 | - |
-|[YOLOX-Tiny]() |416  |31.7  |- | 5.06 |6.45 | - |
-## <div align="center">Quick Start</div>
-### Installation
 Step1. Install [apex](https://github.com/NVIDIA/apex).
@@ -47,25 +45,41 @@ $ cd yolox
 $ pip3 install -v -e .  # or "python3 setup.py develop
 ```
-### Demo
-You can use either -n or -f to specify your detector's config:
 ```shell
-python tools/demo.py -n yolox-s -c <MODEL_PATH> --conf 0.3 --nms 0.65 --tsize 640
 ```
 or
 ```shell
-python tools/demo.py -f exps/base/yolox_s.py -c <MODEL_PATH> --conf 0.3 --nms 0.65 --tsize 640
 ```
-<details open>
 <summary>Reproduce our results on COCO</summary>
-Step1.
-* Reproduce our results on COCO by specifying -n:
 ```shell
 python tools/train.py -n yolox-s -d 8 -b 64 --fp16 -o
@@ -73,12 +87,11 @@ python tools/train.py -n yolox-s -d 8 -b 64 --fp16 -o
                          yolox-l
                          yolox-x
 ```
-Notes:
 * -d: number of gpu devices
-* -b: total batch size, the recommended number for -b equals to num_gpu * 8
 * --fp16: mixed precision training
-The above commands are equivalent to:
 ```shell
 python tools/train.py -f exps/base/yolox-s.py -d 8 -b 64 --fp16 -o
@@ -87,42 +100,49 @@ python tools/train.py -f exps/base/yolox-s.py -d 8 -b 64 --fp16 -o
                          exps/base/yolox-x.py
 ```
-* Customize your training.
-* Finetune your datset on COCO pretrained models.
 </details>
-<details open>
 <summary>Evaluation</summary>
 We support batch testing for fast evaluation:
 ```shell
-python tools/eval.py -n  yolox-s -b 64 --conf 0.001 --fp16 (optional) --fuse (optional) --test (for test-dev set)
                          yolox-m
                          yolox-l
                          yolox-x
 ```
 To reproduce speed test, we use the following command:
 ```shell
-python tools/eval.py -n  yolox-s -b 1 -d 0 --conf 0.001 --fp16 --fuse --test (for test-dev set)
                          yolox-m
                          yolox-l
                          yolox-x
 ```
-## <div align="center">Deployment</div>
 </details>
-1.  [ONNX: Including ONNX export and an ONNXRuntime demo.]()
-2.  [TensorRT in both C++ and Python]()
-3.  [NCNN in C++]()
-4.  [OpenVINO in both C++ and Python]()
-## <div align="center">Cite Our Work</div>
-If you find this project useful for you, please use the following BibTeX entry.
-TODO

+<div align="center"><img src="assets/logo.png" width="350"></div>
 <img src="assets/demo.png" >
+## Introduction
 YOLOX is an anchor-free version of YOLO, with a simpler design but better performance! It aims to bridge the gap between research and industrial communities.
+<img src="assets/git_fig.png" width="1000" >
+## Updates!!
+* 【2020/07/19】 We have released our technical report on Arxiv.
+## Benchmark
+#### Standard Models.
 |Model |size |mAP<sup>test<br>0.5:0.95 | Speed V100<br>(ms) | Params<br>(M) |FLOPs<br>(B)| weights |
 | ------        |:---: | :---:       |:---:     |:---:  | :---: | :----: |
+|[YOLOX-s](./exps/yolox_s.py)    |640  |39.6      |9.8     |9.0 | 26.8 | [Download](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EW62gmO2vnNNs5npxjzunVwB9p307qqygaCkXdTO88BLUg?e=NMTQYw) |
+|[YOLOX-m](./exps/yolox_m.py)    |640  |46.4      |12.3     |25.3 |73.8| [Download](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/ERMTP7VFqrVBrXKMU7Vl4TcBQs0SUeCT7kvc-JdIbej4tQ?e=1MDo9y) |
+|[YOLOX-l](./exps/yolox_l.py)    |640  |50.0  |14.5 |54.2| 155.6 | [Download](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EWA8w_IEOzBKvuueBqfaZh0BeoG5sVzR-XYbOJO4YlOkRw?e=wHWOBE) |
+|[YOLOX-x](./exps/yolox_x.py)   |640  |**51.2**      | 17.3 |99.1 |281.9 | [Download](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EdgVPHBziOVBtGAXHfeHI5kBza0q9yyueMGdT0wXZfI1rQ?e=tABO5u) |
+|[YOLOX-Darknet53](./exps/yolov3.py)   |640  | 47.4      | 11.1 |63.7 | 185.3 | [Download](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EZ-MV1r_fMFPkPrNjvbJEMoBLOLAnXH-XKEB77w8LhXL6Q?e=mf6wOc) |
+#### Light Models.
+|Model |size |mAP<sup>val<br>0.5:0.95 | Params<br>(M) |FLOPs<br>(B)| weights |
+| ------        |:---:  |  :---:       |:---:     |:---:  | :---: |
+|[YOLOX-Nano](./exps/nano.py) |416  |25.3  | 0.91 |1.08 | [Download](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EdcREey-krhLtdtSnxolxiUBjWMy6EFdiaO9bdOwZ5ygCQ?e=yQpdds) |
+|[YOLOX-Tiny](./exps/yolox_tiny.py) |416  |31.7 | 5.06 |6.45 | [Download](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EYtjNFPqvZBBrQ-VowLcSr4B6Z5TdTflUsr_gO2CwhC3bQ?e=SBTwXj) |
+## Quick Start
+<details>
+<summary>Installation</summary>
 Step1. Install [apex](https://github.com/NVIDIA/apex).
 $ pip3 install -v -e .  # or "python3 setup.py develop
 ```
+</details>
+<details>
+<summary>Demo</summary>
+Step1. Download a pretrained model from the benchmark table.
+Step2. Use either -n or -f to specify your detector's config. For example:
 ```shell
+python tools/demo.py image -n yolox-s -c /path/to/your/yolox_s.pth.tar --path assets/dog.jpg --conf 0.3 --nms 0.65 --tsize 640 --save_result
 ```
 or
 ```shell
+python tools/demo.py image -f exps/yolox_s.py -c /path/to/your/yolox_s.pth.tar --path assets/dog.jpg --conf 0.3 --nms 0.65 --tsize 640 --save_result
+```
+Demo for video:
+```shell
+python tools/demo.py video -n yolox-s -c /path/to/your/yolox_s.pth.tar --path /path/to/your/video --conf 0.3 --nms 0.65 --tsize 640 --save_result
 ```
+</details>
+<details>
 <summary>Reproduce our results on COCO</summary>
+Step1. Prepare dataset
+```shell
+cd <YOLOX_HOME>
+mkdir datasets
+ln -s /path/to/your/COCO ./datasets/COCO
+```
+Step2. Reproduce our results on COCO by specifying -n:
 ```shell
 python tools/train.py -n yolox-s -d 8 -b 64 --fp16 -o
                          yolox-l
                          yolox-x
 ```
 * -d: number of gpu devices
+* -b: total batch size, the recommended number for -b is num_gpu * 8
 * --fp16: mixed precision training
+When using -f, the above commands are equivalent to:
 ```shell
 python tools/train.py -f exps/base/yolox-s.py -d 8 -b 64 --fp16 -o
                          exps/base/yolox-x.py
 ```
 </details>
+<details>
 <summary>Evaluation</summary>
 We support batch testing for fast evaluation:
 ```shell
+python tools/eval.py -n  yolox-s -c yolox_s.pth.tar -b 64 -d 8 --conf 0.001 [--fp16] [--fuse]
                          yolox-m
                          yolox-l
                          yolox-x
 ```
+* --fuse: fuse conv and bn
+* -d: number of GPUs used for evaluation. DEFAULT: All GPUs available will be used.
+* -b: total batch size across on all GPUs
 To reproduce speed test, we use the following command:
 ```shell
+python tools/eval.py -n  yolox-s -c yolox_s.pth.tar -b 1 -d 1 --conf 0.001 --fp16 --fuse
                          yolox-m
                          yolox-l
                          yolox-x
 ```
 </details>
+<details open>
+<summary>Toturials</summary>
+*  [Training on custom data](docs/train_custom_data.md).
+</details>
+## Deployment
+1.  [ONNX: Including ONNX export and an ONNXRuntime demo.](./demo/ONNXRuntime)
+2.  [TensorRT in both C++ and Python](./demo/TensorRT)
+3.  [NCNN in C++](./demo/ncnn/android)
+4.  [OpenVINO in both C++ and Python](./demo/OpenVINO)
+## Citing YOLOX
+If you use YOLOX in your research, please cite our work by using the following BibTeX entry:

demo/ONNXRuntime/README.md CHANGED Viewed

@@ -1,17 +1,18 @@
-## ONNXRuntime Demo in Python
 This doc introduces how to convert you pytorch model into onnx, and how to run an onnxruntime demo to verify your convertion.
 ### Download ONNX models.
-| Model | Parameters | GFLOPs | Test Size | mAP |
-|:------| :----: | :----: | :---: | :---: |
-|  [YOLOX-Nano](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.res101.fpn.coco.800size.1x) |  0.91M  | 1.08 | 416x416 | 25.3 |
-|  [YOLOX-Tiny](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.fpn.coco.800size.1x) | 5.06M     | 6.45 | 416x416 |31.7 |
-|  [YOLOX-S](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 9.0M | 26.8 | 640x640 |39.6 |
-|  [YOLOX-M](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 25.3M | 73.8 | 640x640 |46.4 |
-|  [YOLOX-L](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 54.2M | 155.6 | 640x640 |50.0 |
-|  [YOLOX-X](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 99.1M | 281.9 | 640x640 |51.2 |
-|  [YOLOX-Darknet53](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 63.72M | 185.3 | 640x640 |47.3 |
 ### Convert Your Model to ONNX
@@ -28,7 +29,7 @@ python3 tools/export_onnx.py --output-name yolox_s.onnx -n yolox-s -c yolox_s.pt
 Notes:
 * -n: specify a model name. The model name must be one of the [yolox-s,m,l,x and yolox-nane, yolox-tiny, yolov3]
 * -c: the model you have trained
-* -o: opset version, default 11. **However, if you will further convert your onnx model to [OpenVINO](), please specify the opset version to 10.**
 * --no-onnxsim: disable onnxsim
 * To customize an input shape for onnx model,  modify the following code in tools/export.py:
@@ -36,7 +37,7 @@ Notes:
     dummy_input = torch.randn(1, 3, exp.test_size[0], exp.test_size[1])
     ```
-2. Convert a standard YOLOX model by -f. By using -f, the above command is equivalent to:
 ```shell
 python3 tools/export_onnx.py --output-name yolox_s.onnx -f exps/yolox_s.py -c yolox_s.pth.tar
@@ -52,7 +53,7 @@ python3 tools/export_onnx.py --output-name your_yolox.onnx -f exps/your_yolox.py
 Step1.
 ```shell
-cd <YOLOX_HOME>/yolox/deploy/demo_onnxruntime/
 ```
 Step2.

+## YOLOX-ONNXRuntime in Python
 This doc introduces how to convert you pytorch model into onnx, and how to run an onnxruntime demo to verify your convertion.
 ### Download ONNX models.
+| Model | Parameters | GFLOPs | Test Size | mAP | Weights |
+|:------| :----: | :----: | :---: | :---: | :---: |
+|  YOLOX-Nano |  0.91M  | 1.08 | 416x416 | 25.3 | [Download](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EfAGwvevU-lNhW5OqFAyHbwBJdI_7EaKu5yU04fgF5BU7w?e=gvq4hf) |
+|  YOLOX-Tiny | 5.06M     | 6.45 | 416x416 |31.7 | [Download](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EVigCszU1ilDn-MwLwHCF1ABsgTy06xFdVgZ04Yyo4lHVA?e=hVKiCw) |
+|  YOLOX-S | 9.0M | 26.8 | 640x640 |39.6 | [Download](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/Ec0L1d1x2UtIpbfiahgxhtgBZVjb1NCXbotO8SCOdMqpQQ?e=siyIsK) |
+|  YOLOX-M | 25.3M | 73.8 | 640x640 |46.4 | [Download](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/ERUKlQe-nlxBoTKPy1ynbxsBmAZ_h-VBEV-nnfPdzUIkZQ?e=hyQQtl) |
+|  YOLOX-L | 54.2M | 155.6 | 640x640 |50.0 | [Download](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/ET5w926jCA5GlVfg9ixB4KEBiW0HYl7SzaHNRaRG9dYO_A?e=ISmCYX) |
+|  YOLOX-Darknet53| 63.72M | 185.3 | 640x640 |47.3 | [Download](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/ESArloSW-MlPlLuemLh9zKkBdovgweKbfu4zkvzKAp7pPQ?e=f81Ikw) |
+|  YOLOX-X | 99.1M | 281.9 | 640x640 |51.2 | [Download](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/ERjqoeMJlFdGuM3tQfXQmhABmGHlIHydWCwhlugeWLE9AA) |
 ### Convert Your Model to ONNX
 Notes:
 * -n: specify a model name. The model name must be one of the [yolox-s,m,l,x and yolox-nane, yolox-tiny, yolov3]
 * -c: the model you have trained
+* -o: opset version, default 11. **However, if you will further convert your onnx model to [OpenVINO](../OpenVINO/), please specify the opset version to 10.**
 * --no-onnxsim: disable onnxsim
 * To customize an input shape for onnx model,  modify the following code in tools/export.py:
     dummy_input = torch.randn(1, 3, exp.test_size[0], exp.test_size[1])
     ```
+2. Convert a standard YOLOX model by -f. When using -f, the above command is equivalent to:
 ```shell
 python3 tools/export_onnx.py --output-name yolox_s.onnx -f exps/yolox_s.py -c yolox_s.pth.tar
 Step1.
 ```shell
+cd <YOLOX_HOME>/demo/ONNXRuntime
 ```
 Step2.

demo/OpenVINO/README.md CHANGED Viewed

@@ -1,4 +1,4 @@
-## YOLOX on OpenVINO
-* [C++ Demo]()
-* [Python Demo]()

+## YOLOX for OpenVINO
+* [C++ Demo](./cpp)
+* [Python Demo](./python)

demo/OpenVINO/cpp/README.md CHANGED Viewed

@@ -1,17 +1,17 @@
-# User Guide for Deploy YOLOX on OpenVINO
 This toturial includes a C++ demo for OpenVINO, as well as some converted models.
 ### Download OpenVINO models.
-| Model | Parameters | GFLOPs | Test Size | mAP |
-|:------| :----: | :----: | :---: | :---: |
-|  [YOLOX-Nano](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.res101.fpn.coco.800size.1x) |  0.91M  | 1.08 | 416x416 | 25.3 |
-|  [YOLOX-Tiny](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.fpn.coco.800size.1x) | 5.06M     | 6.45 | 416x416 |31.7 |
-|  [YOLOX-S](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 9.0M | 26.8 | 640x640 |39.6 |
-|  [YOLOX-M](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 25.3M | 73.8 | 640x640 |46.4 |
-|  [YOLOX-L](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 54.2M | 155.6 | 640x640 |50.0 |
-|  [YOLOX-X](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 99.1M | 281.9 | 640x640 |51.2 |
-|  [YOLOX-Darknet53](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 63.72M | 185.3 | 640x640 |47.3 |
 ## Install OpenVINO Toolkit
@@ -51,7 +51,7 @@ source ~/.bashrc
 1. Export ONNX model
-   Please refer to the [ONNX toturial]() for more details. **Note that you should set --opset to 10, otherwise your next step will fail.**
 2. Convert ONNX to OpenVINO

+# YOLOX-OpenVINO in C++
 This toturial includes a C++ demo for OpenVINO, as well as some converted models.
 ### Download OpenVINO models.
+| Model | Parameters | GFLOPs | Test Size | mAP | Weights |
+|:------| :----: | :----: | :---: | :---: | :---: |
+|  [YOLOX-Nano](../../../exps/nano.py) |  0.91M  | 1.08 | 416x416 | 25.3 | [Download](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EeWY57o5wQZFtXYd1KJw6Z8B4vxZru649XxQHYIFgio3Qw?e=ZS81ce) |
+|  [YOLOX-Tiny](../../../exps/yolox_tiny.py) | 5.06M     | 6.45 | 416x416 |31.7 | [Download](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/ETfvOoCXdVZNinoSpKA_sEYBIQVqfjjF5_M6VvHRnLVcsA?e=STL1pi) |
+|  [YOLOX-S](../../../exps/yolox_s.py) | 9.0M | 26.8 | 640x640 |39.6 | [Download](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EXUjf3PQnbBLrxNrXPueqaIBzVZOrYQOnJpLK1Fytj5ssA?e=GK0LOM) |
+|  [YOLOX-M](../../../exps/yolox_m.py) | 25.3M | 73.8 | 640x640 |46.4 | [Download](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EcoT1BPpeRpLvE_4c441zn8BVNCQ2naxDH3rho7WqdlgLQ?e=95VaM9) |
+|  [YOLOX-L](../../../exps/yolox_l.py) | 54.2M | 155.6 | 640x640 |50.0 | [Download](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EZvmn-YLRuVPh0GAP_w3xHMB2VGvrKqQXyK_Cv5yi_DXUg?e=YRh6Eq) |
+|  [YOLOX-Darknet53](../../../exps/yolov3.py) | 63.72M | 185.3 | 640x640 |47.3 | [Download](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EQP8LSroikFHuwX0jFRetmcBOCDWSFmylHxolV7ezUPXGw?e=bEw5iq) |
+|  [YOLOX-X](../../../exps/yolox_x.py) | 99.1M | 281.9 | 640x640 |51.2 | [Download](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EZFPnLqiD-xIlt7rcZYDjQgB4YXE9wnq1qaSXQwJrsKbdg?e=83nwEz) |
 ## Install OpenVINO Toolkit
 1. Export ONNX model
+   Please refer to the [ONNX toturial](../../ONNXRuntime). **Note that you should set --opset to 10, otherwise your next step will fail.**
 2. Convert ONNX to OpenVINO

demo/OpenVINO/python/README.md CHANGED Viewed

@@ -1,17 +1,17 @@
-# User Guide for Deploy YOLOX on OpenVINO
 This toturial includes a Python demo for OpenVINO, as well as some converted models.
 ### Download OpenVINO models.
-| Model | Parameters | GFLOPs | Test Size | mAP |
-|:------| :----: | :----: | :---: | :---: |
-|  [YOLOX-Nano](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.res101.fpn.coco.800size.1x) |  0.91M  | 1.08 | 416x416 | 25.3 |
-|  [YOLOX-Tiny](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.fpn.coco.800size.1x) | 5.06M     | 6.45 | 416x416 |31.7 |
-|  [YOLOX-S](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 9.0M | 26.8 | 640x640 |39.6 |
-|  [YOLOX-M](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 25.3M | 73.8 | 640x640 |46.4 |
-|  [YOLOX-L](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 54.2M | 155.6 | 640x640 |50.0 |
-|  [YOLOX-X](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 99.1M | 281.9 | 640x640 |51.2 |
-|  [YOLOX-Darknet53](https://github.com/Joker316701882/OTA/tree/main/playground/detection/coco/ota.x101.dcnv2.fpn.coco.800size.1x) | 63.72M | 185.3 | 640x640 |47.3 |
 ## Install OpenVINO Toolkit
@@ -51,7 +51,7 @@ source ~/.bashrc
 1. Export ONNX model
-   Please refer to the [ONNX toturial]() for more details. **Note that you should set --opset to 10, otherwise your next step will fail.**
 2. Convert ONNX to OpenVINO
@@ -71,7 +71,7 @@ source ~/.bashrc
    ```
    For example:
    ```shell
-   python3 mo.py --input_model yolox.onnx --input_shape (1,3,640,640) --data_type FP16
    ```
 ## Demo

+# YOLOX-OpenVINO in Python
 This toturial includes a Python demo for OpenVINO, as well as some converted models.
 ### Download OpenVINO models.
+| Model | Parameters | GFLOPs | Test Size | mAP | Weights |
+|:------| :----: | :----: | :---: | :---: | :---: |
+|  [YOLOX-Nano](../../../exps/nano.py) |  0.91M  | 1.08 | 416x416 | 25.3 | [Download](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EeWY57o5wQZFtXYd1KJw6Z8B4vxZru649XxQHYIFgio3Qw?e=ZS81ce) |
+|  [YOLOX-Tiny](../../../exps/yolox_tiny.py) | 5.06M     | 6.45 | 416x416 |31.7 | [Download](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/ETfvOoCXdVZNinoSpKA_sEYBIQVqfjjF5_M6VvHRnLVcsA?e=STL1pi) |
+|  [YOLOX-S](../../../exps/yolox_s.py) | 9.0M | 26.8 | 640x640 |39.6 | [Download](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EXUjf3PQnbBLrxNrXPueqaIBzVZOrYQOnJpLK1Fytj5ssA?e=GK0LOM) |
+|  [YOLOX-M](../../../exps/yolox_m.py) | 25.3M | 73.8 | 640x640 |46.4 | [Download](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EcoT1BPpeRpLvE_4c441zn8BVNCQ2naxDH3rho7WqdlgLQ?e=95VaM9) |
+|  [YOLOX-L](../../../exps/yolox_l.py) | 54.2M | 155.6 | 640x640 |50.0 | [Download](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EZvmn-YLRuVPh0GAP_w3xHMB2VGvrKqQXyK_Cv5yi_DXUg?e=YRh6Eq) |
+|  [YOLOX-Darknet53](../../../exps/yolov3.py) | 63.72M | 185.3 | 640x640 |47.3 | [Download](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EQP8LSroikFHuwX0jFRetmcBOCDWSFmylHxolV7ezUPXGw?e=bEw5iq) |
+|  [YOLOX-X](../../../exps/yolox_x.py) | 99.1M | 281.9 | 640x640 |51.2 | [Download](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EZFPnLqiD-xIlt7rcZYDjQgB4YXE9wnq1qaSXQwJrsKbdg?e=83nwEz) |
 ## Install OpenVINO Toolkit
 1. Export ONNX model
+   Please refer to the [ONNX toturial](../../ONNXRuntime). **Note that you should set --opset to 10, otherwise your next step will fail.**
 2. Convert ONNX to OpenVINO
    ```
    For example:
    ```shell
+   python3 mo.py --input_model yolox.onnx --input_shape [1,3,640,640] --data_type FP16 --output_dir converted_output
    ```
 ## Demo

demo/TensorRT/cpp/README.md CHANGED Viewed

@@ -1,4 +1,4 @@
-# User Guide for Deploy YOLOX on TensorRT C++
 As YOLOX models is easy to converted to tensorrt using [torch2trt gitrepo](https://github.com/NVIDIA-AI-IOT/torch2trt),
 our C++ demo will not include the model converting or constructing like other tenorrt demos.
@@ -6,7 +6,7 @@ our C++ demo will not include the model converting or constructing like other te
 ## Step 1: Prepare serialized engine file
-Follow the trt [python demo README](../Python/README.md) to convert and save the serialized engine file.
 ## Step 2: build the demo

+# YOLOX-TensorRT in C++
 As YOLOX models is easy to converted to tensorrt using [torch2trt gitrepo](https://github.com/NVIDIA-AI-IOT/torch2trt),
 our C++ demo will not include the model converting or constructing like other tenorrt demos.
 ## Step 1: Prepare serialized engine file
+Follow the trt [python demo README](../python/README.md) to convert and save the serialized engine file.
 ## Step 2: build the demo

demo/TensorRT/python/README.md CHANGED Viewed

@@ -1,4 +1,4 @@
-# User Guide for Deploy YOLOX on TensorRT
 This toturial includes a Python demo for TensorRT.
@@ -12,21 +12,21 @@ YOLOX models can be easily conveted to TensorRT models using torch2trt
    If you want to convert our model, use the flag -n to specify a model name:
    ```shell
-   python tools/deploy/trt.py -n <YOLOX_MODEL_NAME> -c <YOLOX_CHECKPOINT>
    ```
    For example:
    ```shell
-   python tools/deploy/trt.py -n yolox-s -c your_ckpt.pth.tar
    ```
    <YOLOX_MODEL_NAME> can be: yolox-nano, yolox-tiny. yolox-s, yolox-m, yolox-l, yolox-x.
    If you want to convert your customized model, use the flag -f to specify you exp file:
    ```shell
-   python tools/deploy/trt.py -f <YOLOX_EXP_FILE> -c <YOLOX_CHECKPOINT>
    ```
    For example:
    ```shell
-   python tools/deploy/trt.py -f /path/to/your/yolox/exps/yolox_s.py -c your_ckpt.pth.tar
    ```
    *yolox_s.py* can be any exp file modified by you.

+# YOLOX-TensorRT in Python
 This toturial includes a Python demo for TensorRT.
    If you want to convert our model, use the flag -n to specify a model name:
    ```shell
+   python tools/trt.py -n <YOLOX_MODEL_NAME> -c <YOLOX_CHECKPOINT>
    ```
    For example:
    ```shell
+   python tools/trt.py -n yolox-s -c your_ckpt.pth.tar
    ```
    <YOLOX_MODEL_NAME> can be: yolox-nano, yolox-tiny. yolox-s, yolox-m, yolox-l, yolox-x.
    If you want to convert your customized model, use the flag -f to specify you exp file:
    ```shell
+   python tools/trt.py -f <YOLOX_EXP_FILE> -c <YOLOX_CHECKPOINT>
    ```
    For example:
    ```shell
+   python tools/trt.py -f /path/to/your/yolox/exps/yolox_s.py -c your_ckpt.pth.tar
    ```
    *yolox_s.py* can be any exp file modified by you.

docs/.gitkeep ADDED Viewed

File without changes

docs/train_custom_data.md ADDED Viewed

	@@ -0,0 +1,118 @@

+# Train Custom Data.
+This page explains how to train your own custom data with YOLOX.
+We take an example of finetuing YOLOX-S model on VOC dataset to give a more clear guide.
+## 0. Before you start
+Clone this repo and follow the [README](../README.md) to install YOLOX.
+## 1. Create your own dataset
+**Step 1** Prepare your own dataset with images and labels first. For labeling images, you may use a tool like [Labelme](https://github.com/wkentaro/labelme) or [CVAT](https://github.com/openvinotoolkit/cvat).
+**Step 2** Then, you should write the corresponding Dataset Class which can load images and labels through "\_\_getitem\_\_" method. We currently support COCO format and VOC format.
+You can also write the Dataset by you own. Let's take the [VOC](../yolox/data/datasets/voc.py#L151) Dataset file for example:
+```python
+    @Dataset.resize_getitem
+    def __getitem__(self, index):
+        img, target, img_info, img_id = self.pull_item(index)
+        if self.preproc is not None:
+            img, target = self.preproc(img, target, self.input_dim)
+        return img, target, img_info, img_id
+```
+One more thing worth noting is that you should also implement "[pull_item](../yolox/data/datasets/voc.py#L129)" and "[load_anno](../yolox/data/datasets/voc.py#L121)" method for the Mosiac and MixUp augmentation.
+**Step 3** Prepare the evaluator. We currently have [COCO evaluator](../yolox/evaluators/coco_evaluator.py) and [VOC evaluator](../yolox/evaluators/voc_evaluator.py).
+If you have your own format data or evaluation metric, you may write your own evaluator.
+## 2. Create your Exp file to control everything
+We put everything involved in a model to one single Exp file, including model setting, training setting, and testing setting.
+A complete Exp file is at [yolox_base.py](../yolox/exp/yolox_base.py). It may be too long to write for every exp, but you can inherit the base Exp file and only overwrite the changed part.
+Let's still take the [VOC Exp file](../exps/example/yolox_voc/yolox_voc_s.py) for an example.
+We select YOLOX-S model here, so we should change the network depth and width. VOC has only 20 classes, so we should also change the num_classes.
+These configs are changed in the inti() methd:
+```python
+class Exp(MyExp):
+    def __init__(self):
+        super(Exp, self).__init__()
+        self.num_classes = 20
+        self.depth = 0.33
+        self.width = 0.50
+        self.exp_name = os.path.split(os.path.realpath(__file__))[1].split(".")[0]
+```
+Besides, you should also overwrite the dataset and evaluator preprared before to training the model on your own data.
+Please see "[get_data_loader](../exps/example/yolox_voc/yolox_voc_s.py#L20)", "[get_eval_loader](../exps/example/yolox_voc/yolox_voc_s.py#L82)", and "[get_evaluator](../exps/example/yolox_voc/yolox_voc_s.py#L113)" for more details.
+## 3. Train
+Except special cases, we always recommend to use our [COCO pretrained weights](../README.md) for initializing.
+Once you get the Exp file and the COCO pretrained weights we provided, you can train your own model by the following command:
+```bash
+python tools/train.py -f /path/to/your/Exp/file -d 8 -b 64 --fp16 -o -c /path/to/the/pretrained/weights
+```
+or take the YOLOX-S VOC training for example:
+```bash
+python tools/train.py -f exps/example/yolox_voc/yolox_voc_s.py -d 8 -b 64 --fp16 -o -c /path/to/yolox_s.pth.tar
+```
+(Don't worry for the different shape of detection head between the pretrained weights and your own model, we will handle it)
+## 4. Tips for Best Training Results
+As YOLOX is an anchor-free detector with only several hyper-parameters, most of the time good results can be obtained with no changes to the models or training settings.
+We thus always recommend you first train with all default training settings.
+If at first you don't get good results, there are steps you could considier to take to improve.
+**Model Selection** We provide YOLOX-Nano, YOLOX-Tiny, and YOLOX-S for mobile deployments, while YOLOX-M/L/X for cloud or high performance GPU deployments.
+If your deployment meets some trouble of compatibility. we recommand YOLOX-DarkNet53.
+**Training Configs** If your training overfits early, then you can reduce max\_epochs or decrease the base\_lr and min\_lr\_ratio in your Exp file:
+```python
+# --------------  training config --------------------- #
+    self.warmup_epochs = 5
+    self.max_epoch = 300
+    self.warmup_lr = 0
+    self.basic_lr_per_img = 0.01 / 64.0
+    self.scheduler = "yoloxwarmcos"
+    self.no_aug_epochs = 15
+    self.min_lr_ratio = 0.05
+    self.ema = True
+    self.weight_decay = 5e-4
+    self.momentum = 0.9
+```
+**Aug Configs** You may also change the degree of the augmentations.
+Generally, for small models, you should weak the aug, while for large models or small size of dataset, you may enchance the aug in your Exp file:
+```python
+# --------------- transform config ----------------- #
+    self.degrees = 10.0
+    self.translate = 0.1
+    self.scale = (0.1, 2)
+    self.mscale = (0.8, 1.6)
+    self.shear = 2.0
+    self.perspective = 0.0
+    self.enable_mixup = True
+```
+**Design your own detector** You may refer to our [Arxiv]() paper for details and suggestions for designing your own detector.

exps/example/yolox_voc/yolox_voc_s.py ADDED Viewed

	@@ -0,0 +1,124 @@

+# encoding: utf-8
+import os
+import random
+import torch
+import torch.nn as nn
+import torch.distributed as dist
+from yolox.exp import Exp as MyExp
+class Exp(MyExp):
+    def __init__(self):
+        super(Exp, self).__init__()
+        self.num_classes = 20
+        self.depth = 0.33
+        self.width = 0.50
+        self.eval_interval = 2
+        self.exp_name = os.path.split(os.path.realpath(__file__))[1].split(".")[0]
+    def get_data_loader(self, batch_size, is_distributed, no_aug=False):
+        from yolox.data import (
+            VOCDetection,
+            TrainTransform,
+            YoloBatchSampler,
+            DataLoader,
+            InfiniteSampler,
+            MosaicDetection,
+        )
+        dataset = VOCDetection(
+            data_dir='/data/Datasets/VOCdevkit',
+            image_sets=[('2007', 'trainval'), ('2012', 'trainval')],
+            img_size=self.input_size,
+            preproc=TrainTransform(
+                rgb_means=(0.485, 0.456, 0.406),
+                std=(0.229, 0.224, 0.225),
+                max_labels=50,
+            ),
+        )
+        dataset = MosaicDetection(
+            dataset,
+            mosaic=not no_aug,
+            img_size=self.input_size,
+            preproc=TrainTransform(
+                rgb_means=(0.485, 0.456, 0.406),
+                std=(0.229, 0.224, 0.225),
+                max_labels=120,
+            ),
+            degrees=self.degrees,
+            translate=self.translate,
+            scale=self.scale,
+            shear=self.shear,
+            perspective=self.perspective,
+            enable_mixup=self.enable_mixup,
+        )
+        self.dataset = dataset
+        if is_distributed:
+            batch_size = batch_size // dist.get_world_size()
+            sampler = InfiniteSampler(
+                len(self.dataset), seed=self.seed if self.seed else 0
+            )
+        else:
+            sampler = torch.utils.data.RandomSampler(self.dataset)
+        batch_sampler = YoloBatchSampler(
+            sampler=sampler,
+            batch_size=batch_size,
+            drop_last=False,
+            input_dimension=self.input_size,
+            mosaic=not no_aug,
+        )
+        dataloader_kwargs = {"num_workers": self.data_num_workers, "pin_memory": True}
+        dataloader_kwargs["batch_sampler"] = batch_sampler
+        train_loader = DataLoader(self.dataset, **dataloader_kwargs)
+        return train_loader
+    def get_eval_loader(self, batch_size, is_distributed, testdev=False):
+        from yolox.data import VOCDetection, ValTransform
+        valdataset = VOCDetection(
+            data_dir='/data/Datasets/VOCdevkit',
+            image_sets=[('2007', 'test')],
+            img_size=self.test_size,
+            preproc=ValTransform(
+                rgb_means=(0.485, 0.456, 0.406),
+                std=(0.229, 0.224, 0.225),
+            ),
+        )
+        if is_distributed:
+            batch_size = batch_size // dist.get_world_size()
+            sampler = torch.utils.data.distributed.DistributedSampler(
+                valdataset, shuffle=False
+            )
+        else:
+            sampler = torch.utils.data.SequentialSampler(valdataset)
+        dataloader_kwargs = {
+            "num_workers": self.data_num_workers,
+            "pin_memory": True,
+            "sampler": sampler,
+        }
+        dataloader_kwargs["batch_size"] = batch_size
+        val_loader = torch.utils.data.DataLoader(valdataset, **dataloader_kwargs)
+        return val_loader
+    def get_evaluator(self, batch_size, is_distributed, testdev=False):
+        from yolox.evalutors import VOCEvaluator
+        val_loader = self.get_eval_loader(batch_size, is_distributed, testdev=testdev)
+        evaluator = VOCEvaluator(
+            dataloader=val_loader,
+            img_size=self.test_size,
+            confthre=self.test_conf,
+            nmsthre=self.nmsthre,
+            num_classes=self.num_classes,
+        )
+        return evaluator

requirements.txt CHANGED Viewed

@@ -12,3 +12,6 @@ Pillow
 skimage
 thop
 ninja

 skimage
 thop
 ninja
+tabulate
+tensorboard
+onnxruntime

tools/demo.py CHANGED Viewed

@@ -66,12 +66,6 @@ def make_parser():
         action="store_true",
         help="Using TensorRT model for testing.",
     )
-    parser.add_argument(
-        "opts",
-        help="Modify config options using the command-line",
-        default=None,
-        nargs=argparse.REMAINDER,
-    )
     return parser
@@ -137,13 +131,14 @@ class Predictor(object):
     def visual(self, output, img_info, cls_conf=0.35):
         ratio = img_info['ratio']
         img = img_info['raw_img']
         output = output.cpu()
         bboxes = output[:, 0:4]
         # preprocessing: resize
         bboxes /= ratio
-        bboxes = xyxy2xywh(bboxes)
         cls = output[:, 6]
         scores = output[:, 4] * output[:, 5]
@@ -193,7 +188,7 @@ def imageflow_demo(predictor, vis_folder, current_time, args):
         ret_val, frame = cap.read()
         if ret_val:
             outputs, img_info = predictor.inference(frame)
-            result_frame = predictor.visualize(outputs[0], img_info)
             if args.save_result:
                 vid_writer.write(result_frame)
             ch = cv2.waitKey(1)
@@ -258,7 +253,7 @@ def main(exp, args):
             "TensorRT model is not support model fusing!"
         trt_file = os.path.join(file_name, "model_trt.pth")
         assert os.path.exists(trt_file), (
-            "TensorRT model is not found!\n Run python3 yolox/deploy/trt.py first!"
         )
         model.head.decode_in_inference = False
         decoder = model.head.decode_outputs

         action="store_true",
         help="Using TensorRT model for testing.",
     )
     return parser
     def visual(self, output, img_info, cls_conf=0.35):
         ratio = img_info['ratio']
         img = img_info['raw_img']
+        if output is None:
+            return img
         output = output.cpu()
         bboxes = output[:, 0:4]
         # preprocessing: resize
         bboxes /= ratio
         cls = output[:, 6]
         scores = output[:, 4] * output[:, 5]
         ret_val, frame = cap.read()
         if ret_val:
             outputs, img_info = predictor.inference(frame)
+            result_frame = predictor.visual(outputs[0], img_info)
             if args.save_result:
                 vid_writer.write(result_frame)
             ch = cv2.waitKey(1)
             "TensorRT model is not support model fusing!"
         trt_file = os.path.join(file_name, "model_trt.pth")
         assert os.path.exists(trt_file), (
+            "TensorRT model is not found!\n Run python3 tools/trt.py first!"
         )
         model.head.decode_in_inference = False
         decoder = model.head.decode_outputs

yolox/data/datasets/coco.py CHANGED Viewed

@@ -46,29 +46,20 @@ class COCODataset(Dataset):
         cats = self.coco.loadCats(self.coco.getCatIds())
         self._classes = tuple([c["name"] for c in cats])
         self.name = name
-        self.max_labels = 50
         self.img_size = img_size
         self.preproc = preproc
     def __len__(self):
         return len(self.ids)
-    def pull_item(self, index):
         id_ = self.ids[index]
         im_ann = self.coco.loadImgs(id_)[0]
         width = im_ann["width"]
         height = im_ann["height"]
-        anno_ids = self.coco.getAnnIds(imgIds=[int(id_)], iscrowd=False)
-        annotations = self.coco.loadAnns(anno_ids)
-        # load image and preprocess
-        img_file = os.path.join(
-            self.data_dir, self.name, "{:012}".format(id_) + ".jpg"
-        )
-        img = cv2.imread(img_file)
-        assert img is not None
         # load labels
         valid_objs = []
@@ -90,6 +81,25 @@ class COCODataset(Dataset):
             res[ix, 0:4] = obj["clean_bbox"]
             res[ix, 4] = cls
         img_info = (height, width)
         return img, res, img_info, id_
@@ -105,7 +115,7 @@ class COCODataset(Dataset):
         Returns:
             img (numpy.ndarray): pre-processed image
             padded_labels (torch.Tensor): pre-processed label data.
-                The shape is :math:`[self.max_labels, 5]`.
                 each label consists of [class, xc, yc, w, h]:
                     class (float): class index.
                     xc, yc (float) : center of bbox whose values range from 0 to 1.

         cats = self.coco.loadCats(self.coco.getCatIds())
         self._classes = tuple([c["name"] for c in cats])
         self.name = name
         self.img_size = img_size
         self.preproc = preproc
     def __len__(self):
         return len(self.ids)
+    def load_anno(self, index):
         id_ = self.ids[index]
+        anno_ids = self.coco.getAnnIds(imgIds=[int(id_)], iscrowd=False)
+        annotations = self.coco.loadAnns(anno_ids)
         im_ann = self.coco.loadImgs(id_)[0]
         width = im_ann["width"]
         height = im_ann["height"]
         # load labels
         valid_objs = []
             res[ix, 0:4] = obj["clean_bbox"]
             res[ix, 4] = cls
+        return res
+    def pull_item(self, index):
+        id_ = self.ids[index]
+        im_ann = self.coco.loadImgs(id_)[0]
+        width = im_ann["width"]
+        height = im_ann["height"]
+        # load image and preprocess
+        img_file = os.path.join(
+            self.data_dir, self.name, "{:012}".format(id_) + ".jpg"
+        )
+        img = cv2.imread(img_file)
+        assert img is not None
+        # load anno
+        res = self.load_anno(index)
         img_info = (height, width)
         return img, res, img_info, id_
         Returns:
             img (numpy.ndarray): pre-processed image
             padded_labels (torch.Tensor): pre-processed label data.
+                The shape is :math:`[max_labels, 5]`.
                 each label consists of [class, xc, yc, w, h]:
                     class (float): class index.
                     xc, yc (float) : center of bbox whose values range from 0 to 1.

yolox/data/datasets/mosaicdetection.py CHANGED Viewed

@@ -93,7 +93,6 @@ class MosaicDetection(Dataset):
                     labels[:, 1] = scale * _labels[:, 1] + padh
                     labels[:, 2] = scale * _labels[:, 2] + padw
                     labels[:, 3] = scale * _labels[:, 3] + padh
                 labels4.append(labels)
             if len(labels4):
@@ -136,9 +135,7 @@ class MosaicDetection(Dataset):
         cp_labels = []
         while len(cp_labels) == 0:
             cp_index = random.randint(0, self.__len__() - 1)
-            id_ = self._dataset.ids[cp_index]
-            anno_ids = self._dataset.coco.getAnnIds(imgIds=[int(id_)], iscrowd=False)
-            cp_labels = self._dataset.coco.loadAnns(anno_ids)
         img, cp_labels, _, _ = self._dataset.pull_item(cp_index)
         if len(img.shape) == 3:

                     labels[:, 1] = scale * _labels[:, 1] + padh
                     labels[:, 2] = scale * _labels[:, 2] + padw
                     labels[:, 3] = scale * _labels[:, 3] + padh
                 labels4.append(labels)
             if len(labels4):
         cp_labels = []
         while len(cp_labels) == 0:
             cp_index = random.randint(0, self.__len__() - 1)
+            cp_labels = self._dataset.load_anno(cp_index)
         img, cp_labels, _, _ = self._dataset.pull_item(cp_index)
         if len(img.shape) == 3:

yolox/data/datasets/voc.py CHANGED Viewed

@@ -19,16 +19,6 @@ from yolox.evalutors.voc_eval import voc_eval
 from .datasets_wrapper import Dataset
 from .voc_classes import VOC_CLASSES
-# for making bounding boxes pretty
-COLORS = (
-    (255, 0, 0, 128),
-    (0, 255, 0, 128),
-    (0, 0, 255, 128),
-    (0, 255, 255, 128),
-    (255, 0, 255, 128),
-    (255, 255, 0, 128),
-)
 class AnnotationTransform(object):
@@ -100,16 +90,17 @@ class VOCDetection(Dataset):
     def __init__(
         self,
-        root,
-        image_sets,
         preproc=None,
         target_transform=AnnotationTransform(),
-        input_dim=(416, 416),
         dataset_name="VOC0712",
     ):
-        super().__init__(input_dim)
-        self.root = root
         self.image_set = image_sets
         self.preproc = preproc
         self.target_transform = target_transform
         self.name = dataset_name
@@ -125,59 +116,16 @@ class VOCDetection(Dataset):
             ):
                 self.ids.append((rootpath, line.strip()))
-    @Dataset.resize_getitem
-    def __getitem__(self, index):
-        img_id = self.ids[index]
-        target = ET.parse(self._annopath % img_id).getroot()
-        img = cv2.imread(self._imgpath % img_id, cv2.IMREAD_COLOR)
-        # img = Image.open(self._imgpath % img_id).convert('RGB')
-        height, width, _ = img.shape
-        if self.target_transform is not None:
-            target = self.target_transform(target)
-        if self.preproc is not None:
-            img, target = self.preproc(img, target, self.input_dim)
-            # print(img.size())
-        img_info = (width, height)
-        return img, target, img_info, img_id
     def __len__(self):
         return len(self.ids)
-    def pull_image(self, index):
-        """Returns the original image object at index in PIL form
-        Note: not using self.__getitem__(), as any transformations passed in
-        could mess up this functionality.
-        Argument:
-            index (int): index of img to show
-        Return:
-            PIL img
-        """
         img_id = self.ids[index]
-        return cv2.imread(self._imgpath % img_id, cv2.IMREAD_COLOR)
-    def pull_anno(self, index):
-        """Returns the original annotation of image at index
-        Note: not using self.__getitem__(), as any transformations passed in
-        could mess up this functionality.
-        Argument:
-            index (int): index of img to get annotation of
-        Return:
-            list:  [img_id, [(label, bbox coords),...]]
-                eg: ('001718', [('dog', (96, 13, 438, 332))])
-        """
-        img_id = self.ids[index]
-        anno = ET.parse(self._annopath % img_id).getroot()
-        gt = self.target_transform(anno, 1, 1)
-        return img_id[1], gt
     def pull_item(self, index):
         """Returns the original image and target at an index for mixup
@@ -191,14 +139,21 @@ class VOCDetection(Dataset):
             img, target
         """
         img_id = self.ids[index]
-        target = ET.parse(self._annopath % img_id).getroot()
         img = cv2.imread(self._imgpath % img_id, cv2.IMREAD_COLOR)
         height, width, _ = img.shape
         img_info = (width, height)
-        if self.target_transform is not None:
-            target = self.target_transform(target)
         return img, target, img_info, img_id
@@ -212,7 +167,7 @@ class VOCDetection(Dataset):
         all_boxes[class][image] = [] or np.array of shape #dets x 5
         """
         self._write_voc_results_file(all_boxes)
-        IouTh = np.linspace(0.5, 0.95, np.round((0.95 - 0.5) / 0.05) + 1, endpoint=True)
         mAPs = []
         for iou in IouTh:
             mAP = self._do_python_eval(output_dir, iou)
@@ -270,7 +225,7 @@ class VOCDetection(Dataset):
         aps = []
         # The PASCAL VOC metric changed in 2010
         use_07_metric = True if int(self._year) < 2010 else False
-        print("VOC07 metric? " + ("Yes" if use_07_metric else "No"))
         if output_dir is not None and not os.path.isdir(output_dir):
             os.mkdir(output_dir)
         for i, cls in enumerate(VOC_CLASSES):

 from .datasets_wrapper import Dataset
 from .voc_classes import VOC_CLASSES
 class AnnotationTransform(object):
     def __init__(
         self,
+        data_dir,
+        image_sets=[('2007', 'trainval'), ('2012', 'trainval')],
+        img_size=(416, 416),
         preproc=None,
         target_transform=AnnotationTransform(),
         dataset_name="VOC0712",
     ):
+        super().__init__(img_size)
+        self.root = data_dir
         self.image_set = image_sets
+        self.img_size = img_size
         self.preproc = preproc
         self.target_transform = target_transform
         self.name = dataset_name
             ):
                 self.ids.append((rootpath, line.strip()))
     def __len__(self):
         return len(self.ids)
+    def load_anno(self, index):
         img_id = self.ids[index]
+        target = ET.parse(self._annopath % img_id).getroot()
+        if self.target_transform is not None:
+            target = self.target_transform(target)
+        return target
     def pull_item(self, index):
         """Returns the original image and target at an index for mixup
             img, target
         """
         img_id = self.ids[index]
         img = cv2.imread(self._imgpath % img_id, cv2.IMREAD_COLOR)
         height, width, _ = img.shape
+        target = self.load_anno(index)
         img_info = (width, height)
+        return img, target, img_info, index
+    @Dataset.resize_getitem
+    def __getitem__(self, index):
+        img, target, img_info, img_id = self.pull_item(index)
+        if self.preproc is not None:
+            img, target = self.preproc(img, target, self.input_dim)
         return img, target, img_info, img_id
         all_boxes[class][image] = [] or np.array of shape #dets x 5
         """
         self._write_voc_results_file(all_boxes)
+        IouTh = np.linspace(0.5, 0.95, int(np.round((0.95 - 0.5) / 0.05)) + 1, endpoint=True)
         mAPs = []
         for iou in IouTh:
             mAP = self._do_python_eval(output_dir, iou)
         aps = []
         # The PASCAL VOC metric changed in 2010
         use_07_metric = True if int(self._year) < 2010 else False
+        print("Eval IoU : {:.2f}".format(iou))
         if output_dir is not None and not os.path.isdir(output_dir):
             os.mkdir(output_dir)
         for i, cls in enumerate(VOC_CLASSES):

yolox/{evalutors → evaluators}/__init__.py RENAMED Viewed

File without changes

yolox/{evalutors → evaluators}/coco_evaluator.py RENAMED Viewed

File without changes

yolox/{evalutors → evaluators}/voc_eval.py RENAMED Viewed

File without changes

yolox/evaluators/voc_evaluator.py ADDED Viewed

	@@ -0,0 +1,183 @@

+#!/usr/bin/env python3
+# -*- coding:utf-8 -*-
+# Copyright (c) Megvii, Inc. and its affiliates.
+import sys
+import tempfile
+import time
+from collections import ChainMap
+from loguru import logger
+from tqdm import tqdm
+import numpy as np
+import torch
+from yolox.utils import gather, is_main_process, postprocess, synchronize, time_synchronized
+class VOCEvaluator:
+    """
+    VOC AP Evaluation class.
+    """
+    def __init__(
+        self, dataloader, img_size, confthre, nmsthre, num_classes,
+    ):
+        """
+        Args:
+            dataloader (Dataloader): evaluate dataloader.
+            img_size (int): image size after preprocess. images are resized
+                to squares whose shape is (img_size, img_size).
+            confthre (float): confidence threshold ranging from 0 to 1, which
+                is defined in the config file.
+            nmsthre (float): IoU threshold of non-max supression ranging from 0 to 1.
+        """
+        self.dataloader = dataloader
+        self.img_size = img_size
+        self.confthre = confthre
+        self.nmsthre = nmsthre
+        self.num_classes = num_classes
+        self.num_images = len(dataloader.dataset)
+    def evaluate(
+        self, model, distributed=False, half=False, trt_file=None, decoder=None, test_size=None
+    ):
+        """
+        VOC average precision (AP) Evaluation. Iterate inference on the test dataset
+        and the results are evaluated by COCO API.
+        NOTE: This function will change training mode to False, please save states if needed.
+        Args:
+            model : model to evaluate.
+        Returns:
+            ap50_95 (float) : COCO style AP of IoU=50:95
+            ap50 (float) : VOC 2007 metric AP of IoU=50
+            summary (sr): summary info of evaluation.
+        """
+        # TODO half to amp_test
+        tensor_type = torch.cuda.HalfTensor if half else torch.cuda.FloatTensor
+        model = model.eval()
+        if half:
+            model = model.half()
+        ids = []
+        data_list = {}
+        progress_bar = tqdm if is_main_process() else iter
+        inference_time = 0
+        nms_time = 0
+        n_samples = len(self.dataloader) - 1
+        if trt_file is not None:
+            from torch2trt import TRTModule
+            model_trt = TRTModule()
+            model_trt.load_state_dict(torch.load(trt_file))
+            x = torch.ones(1, 3, test_size[0], test_size[1]).cuda()
+            model(x)
+            model = model_trt
+        for cur_iter, (imgs, _, info_imgs, ids) in enumerate(progress_bar(self.dataloader)):
+            with torch.no_grad():
+                imgs = imgs.type(tensor_type)
+                # skip the the last iters since batchsize might be not enough for batch inference
+                is_time_record = cur_iter < len(self.dataloader) - 1
+                if is_time_record:
+                    start = time.time()
+                outputs = model(imgs)
+                if decoder is not None:
+                    outputs = decoder(outputs, dtype=outputs.type())
+                if is_time_record:
+                    infer_end = time_synchronized()
+                    inference_time += infer_end - start
+                outputs = postprocess(
+                    outputs, self.num_classes, self.confthre, self.nmsthre
+                )
+                if is_time_record:
+                    nms_end = time_synchronized()
+                    nms_time += nms_end - infer_end
+            data_list.update(self.convert_to_voc_format(outputs, info_imgs, ids))
+        statistics = torch.cuda.FloatTensor([inference_time, nms_time, n_samples])
+        if distributed:
+            data_list = gather(data_list, dst=0)
+            data_list = ChainMap(*data_list)
+            torch.distributed.reduce(statistics, dst=0)
+        eval_results = self.evaluate_prediction(data_list, statistics)
+        synchronize()
+        return eval_results
+    def convert_to_voc_format(self, outputs, info_imgs, ids):
+        predictions = {}
+        for (output, img_h, img_w, img_id) in zip(outputs, info_imgs[0], info_imgs[1], ids):
+            if output is None:
+                predictions[int(img_id)] = (None, None, None)
+                continue
+            output = output.cpu()
+            bboxes = output[:, 0:4]
+            # preprocessing: resize
+            scale = min(self.img_size[0] / float(img_h), self.img_size[1] / float(img_w))
+            bboxes /= scale
+            cls = output[:, 6]
+            scores = output[:, 4] * output[:, 5]
+            predictions[int(img_id)] = (bboxes, cls, scores)
+        return predictions
+    def evaluate_prediction(self, data_dict, statistics):
+        if not is_main_process():
+            return 0, 0, None
+        logger.info("Evaluate in main process...")
+        inference_time = statistics[0].item()
+        nms_time = statistics[1].item()
+        n_samples = statistics[2].item()
+        a_infer_time = 1000 * inference_time / (n_samples * self.dataloader.batch_size)
+        a_nms_time = 1000 * nms_time / (n_samples * self.dataloader.batch_size)
+        time_info = ", ".join(
+            ["Average {} time: {:.2f} ms".format(k, v) for k, v in zip(
+                ["forward", "NMS", "inference"],
+                [a_infer_time, a_nms_time, (a_infer_time + a_nms_time)]
+            )]
+        )
+        info = time_info + "\n"
+        all_boxes = [[[] for _ in range(self.num_images)] for _ in range(self.num_classes)]
+        for img_num in range(self.num_images):
+            bboxes, cls, scores = data_dict[img_num]
+            if bboxes is None:
+                for j in range(self.num_classes):
+                    all_boxes[j][img_num] = np.empty([0, 5], dtype=np.float32)
+                continue
+            for j in range(self.num_classes):
+                mask_c = cls == j
+                if sum(mask_c) == 0:
+                    all_boxes[j][img_num] = np.empty([0, 5], dtype=np.float32)
+                    continue
+                c_dets = torch.cat((bboxes, scores.unsqueeze(1)), dim=1)
+                all_boxes[j][img_num] = c_dets[mask_c].numpy()
+            sys.stdout.write(
+                "im_eval: {:d}/{:d} \r".format(img_num + 1, self.num_images)
+            )
+            sys.stdout.flush()
+        with tempfile.TemporaryDirectory() as tempdir:
+            mAP50, mAP70 = self.dataloader.dataset.evaluate_detections(all_boxes, tempdir)
+            return mAP50, mAP70, info

yolox/evalutors/voc_evaluator.py DELETED Viewed

@@ -1,202 +0,0 @@
-#!/usr/bin/env python3
-# -*- coding:utf-8 -*-
-# Copyright (c) Megvii, Inc. and its affiliates.
-# NOTE: this file is not finished.
-import sys
-import tempfile
-import time
-from tqdm import tqdm
-import torch
-from yolox.data.dataset.vocdataset import ValTransform
-from yolox.utils import get_rank, is_main_process, make_pred_vis, make_vis, synchronize
-def _accumulate_predictions_from_multiple_gpus(predictions_per_gpu):
-    all_predictions = dist.scatter_gather(predictions_per_gpu)
-    if not is_main_process():
-        return
-    # merge the list of dicts
-    predictions = {}
-    for p in all_predictions:
-        predictions.update(p)
-    # convert a dict where the key is the index in a list
-    image_ids = list(sorted(predictions.keys()))
-    if len(image_ids) != image_ids[-1] + 1:
-        print("num_imgs: ", len(image_ids))
-        print("last img_id: ", image_ids[-1])
-        print(
-            "Number of images that were gathered from multiple processes is not "
-            "a contiguous set. Some images might be missing from the evaluation"
-        )
-    # convert to a list
-    predictions = [predictions[i] for i in image_ids]
-    return predictions
-class VOCEvaluator:
-    """
-    COCO AP Evaluation class.
-    All the data in the val2017 dataset are processed \
-    and evaluated by COCO API.
-    """
-    def __init__(self, data_dir, img_size, confthre, nmsthre, vis=False):
-        """
-        Args:
-            data_dir (str): dataset root directory
-            img_size (int): image size after preprocess. images are resized \
-                to squares whose shape is (img_size, img_size).
-            confthre (float):
-                confidence threshold ranging from 0 to 1, \
-                which is defined in the config file.
-            nmsthre (float):
-                IoU threshold of non-max supression ranging from 0 to 1.
-        """
-        test_sets = [("2007", "test")]
-        self.dataset = VOCDetection(
-            root=data_dir,
-            image_sets=test_sets,
-            input_dim=img_size,
-            preproc=ValTransform(
-                rgb_means=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)
-            ),
-        )
-        self.num_images = len(self.dataset)
-        self.dataloader = torch.utils.data.DataLoader(
-            self.dataset, batch_size=1, shuffle=False, num_workers=0
-        )
-        self.img_size = img_size
-        self.confthre = confthre
-        self.nmsthre = nmsthre
-        self.vis = vis
-    def evaluate(self, model, distributed=False):
-        """
-        COCO average precision (AP) Evaluation. Iterate inference on the test dataset
-        and the results are evaluated by COCO API.
-        Args:
-            model : model object
-        Returns:
-            ap50_95 (float) : calculated COCO AP for IoU=50:95
-            ap50 (float) : calculated COCO AP for IoU=50
-        """
-        if isinstance(model, torch.nn.parallel.DistributedDataParallel):
-            model = model.module
-        model.eval()
-        cuda = torch.cuda.is_available()
-        Tensor = torch.cuda.FloatTensor if cuda else torch.FloatTensor
-        ids = []
-        data_dict = []
-        dataiterator = iter(self.dataloader)
-        img_num = 0
-        indices = list(range(self.num_images))
-        dis_indices = indices[get_rank() :: distributed_util.get_world_size()]
-        progress_bar = tqdm if distributed_util.is_main_process() else iter
-        num_classes = 20
-        predictions = {}
-        if is_main_process():
-            inference_time = 0
-            nms_time = 0
-            n_samples = len(dis_indices)
-        for i in progress_bar(dis_indices):
-            img, _, info_img, id_ = self.dataset[i]  # load a batch
-            info_img = [float(info) for info in info_img]
-            ids.append(id_)
-            with torch.no_grad():
-                img = Variable(img.type(Tensor).unsqueeze(0))
-                if is_main_process() and i > 9:
-                    start = time.time()
-                if self.vis:
-                    outputs, fuse_weights, fused_f = model(img)
-                else:
-                    outputs = model(img)
-                if is_main_process() and i > 9:
-                    infer_end = time.time()
-                    inference_time += infer_end - start
-                outputs = postprocess(outputs, 20, self.confthre, self.nmsthre)
-                if is_main_process() and i > 9:
-                    nms_end = time.time()
-                    nms_time += nms_end - infer_end
-                if outputs[0] is None:
-                    predictions[i] = (None, None, None)
-                    continue
-                outputs = outputs[0].cpu().data
-            bboxes = outputs[:, 0:4]
-            bboxes[:, 0::2] *= info_img[0] / self.img_size[0]
-            bboxes[:, 1::2] *= info_img[1] / self.img_size[1]
-            cls = outputs[:, 6]
-            scores = outputs[:, 4] * outputs[:, 5]
-            predictions[i] = (bboxes, cls, scores)
-            if self.vis:
-                o_img, _, _, _ = self.dataset.pull_item(i)
-                make_vis("VOC", i, o_img, fuse_weights, fused_f)
-                class_names = self.dataset._classes
-                bbox = bboxes.clone()
-                bbox[:, 2] = bbox[:, 2] - bbox[:, 0]
-                bbox[:, 3] = bbox[:, 3] - bbox[:, 1]
-                make_pred_vis("VOC", i, o_img, class_names, bbox, cls, scores)
-            if is_main_process():
-                o_img, _, _, _ = self.dataset.pull_item(i)
-                class_names = self.dataset._classes
-                bbox = bboxes.clone()
-                bbox[:, 2] = bbox[:, 2] - bbox[:, 0]
-                bbox[:, 3] = bbox[:, 3] - bbox[:, 1]
-                make_pred_vis("VOC", i, o_img, class_names, bbox, cls, scores)
-        synchronize()
-        predictions = _accumulate_predictions_from_multiple_gpus(predictions)
-        if not is_main_process():
-            return 0, 0
-        print("Main process Evaluating...")
-        a_infer_time = 1000 * inference_time / (n_samples - 10)
-        a_nms_time = 1000 * nms_time / (n_samples - 10)
-        print(
-            "Average forward time: %.2f ms, Average NMS time: %.2f ms, Average inference time: %.2f ms"
-            % (a_infer_time, a_nms_time, (a_infer_time + a_nms_time))
-        )
-        all_boxes = [[[] for _ in range(self.num_images)] for _ in range(num_classes)]
-        for img_num in range(self.num_images):
-            bboxes, cls, scores = predictions[img_num]
-            if bboxes is None:
-                for j in range(num_classes):
-                    all_boxes[j][img_num] = np.empty([0, 5], dtype=np.float32)
-                continue
-            for j in range(num_classes):
-                mask_c = cls == j
-                if sum(mask_c) == 0:
-                    all_boxes[j][img_num] = np.empty([0, 5], dtype=np.float32)
-                    continue
-                c_dets = torch.cat((bboxes, scores.unsqueeze(1)), dim=1)
-                all_boxes[j][img_num] = c_dets[mask_c].numpy()
-            sys.stdout.write(
-                "im_eval: {:d}/{:d} \r".format(img_num + 1, self.num_images)
-            )
-            sys.stdout.flush()
-        with tempfile.TemporaryDirectory() as tempdir:
-            mAP50, mAP70 = self.dataset.evaluate_detections(all_boxes, tempdir)
-            return mAP50, mAP70

yolox/models/yolo_head.py CHANGED Viewed

@@ -166,6 +166,13 @@ class YOLOXHead(nn.Module):
                     torch.zeros(1, grid.shape[1]).fill_(stride_this_level).type_as(xin[0])
                 )
                 if self.use_l1:
                     origin_preds.append(reg_output.clone())
             else:
@@ -193,7 +200,7 @@ class YOLOXHead(nn.Module):
         batch_size = output.shape[0]
         n_ch = 5 + self.num_classes
         hsize, wsize = output.shape[-2:]
-        if grid.shape[2:3] != output.shape[2:3]:
             yv, xv = torch.meshgrid([torch.arange(hsize), torch.arange(wsize)])
             grid = torch.stack((xv, yv), 2).view(1, 1, hsize, wsize, 2).type(dtype)
             self.grids[k] = grid

                     torch.zeros(1, grid.shape[1]).fill_(stride_this_level).type_as(xin[0])
                 )
                 if self.use_l1:
+                    batch_size = reg_output.shape[0]
+                    hsize, wsize = reg_output.shape[-2:]
+                    reg_output = reg_output.view(batch_size, self.n_anchors, 4, hsize, wsize)
+                    reg_output = (
+                        reg_output.permute(0, 1, 3, 4, 2)
+                        .reshape(batch_size, -1, 4)
+                    )
                     origin_preds.append(reg_output.clone())
             else:
         batch_size = output.shape[0]
         n_ch = 5 + self.num_classes
         hsize, wsize = output.shape[-2:]
+        if grid.shape[2:4] != output.shape[2:4]:
             yv, xv = torch.meshgrid([torch.arange(hsize), torch.arange(wsize)])
             grid = torch.stack((xv, yv), 2).view(1, 1, hsize, wsize, 2).type(dtype)
             self.grids[k] = grid

yolox/utils/visualize.py CHANGED Viewed

@@ -18,8 +18,8 @@ def vis(img, boxes, scores, cls_ids, conf=0.5, class_names=None):
             continue
         x0 = int(box[0])
         y0 = int(box[1])
-        x1 = int(box[0] + box[2])
-        y1 = int(box[1] + box[3])
         color = (_COLORS[cls_id] * 255).astype(np.uint8).tolist()
         text = '{}:{:.1f}%'.format(class_names[cls_id], score * 100)

             continue
         x0 = int(box[0])
         y0 = int(box[1])
+        x1 = int(box[2])
+        y1 = int(box[3])
         color = (_COLORS[cls_id] * 255).astype(np.uint8).tolist()
         text = '{}:{:.1f}%'.format(class_names[cls_id], score * 100)