Spaces:

tidalove
/

yolox

Sleeping

App Files Files Community

ruinmessi commited on Aug 19, 2021

Commit

a541e3c

1 Parent(s): 91dbb11

feat(yolox): support torch amp and img caching, update preprocess logic (#523)

Browse files

Files changed (34) hide show

README.md +33 -12
demo/MegEngine/cpp/yolox.cpp +1 -4
demo/MegEngine/python/demo.py +1 -3
demo/ONNXRuntime/README.md +7 -7
demo/ONNXRuntime/onnx_inference.py +1 -3
demo/OpenVINO/cpp/README.md +10 -8
demo/OpenVINO/cpp/yolox_openvino.cpp +1 -5
demo/OpenVINO/python/README.md +7 -7
demo/OpenVINO/python/openvino_inference.py +1 -3
demo/TensorRT/cpp/yolox.cpp +1 -5
demo/ncnn/cpp/yolox.cpp +1 -8
docs/model_zoo.md +31 -7
docs/quick_run.md +4 -11
docs/train_custom_data.md +3 -2
docs/updates_note.md +55 -0
exps/default/nano.py +2 -1
exps/example/yolox_voc/yolox_voc_s.py +26 -24
tools/demo.py +17 -7
tools/eval.py +8 -1
tools/train.py +10 -3
yolox/core/launch.py +30 -6
yolox/core/trainer.py +11 -21
yolox/data/__init__.py +1 -1
yolox/data/data_augment.py +20 -57
yolox/data/data_prefetcher.py +0 -26
yolox/data/dataloading.py +10 -75
yolox/data/datasets/coco.py +93 -16
yolox/data/datasets/datasets_wrapper.py +5 -19
yolox/data/datasets/mosaicdetection.py +23 -11
yolox/data/datasets/voc.py +113 -17
yolox/data/samplers.py +4 -14
yolox/exp/yolox_base.py +45 -31
yolox/models/yolo_head.py +8 -7
yolox/utils/dist.py +32 -2

README.md CHANGED Viewed

@@ -10,6 +10,7 @@ This repo is an implementation of PyTorch version YOLOX, there is also a [MegEng
 <img src="assets/git_fig.png" width="1000" >
 ## Updates!!
 * 【2021/08/05】 We release [MegEngine version YOLOX](https://github.com/MegEngine/YOLOX).
 * 【2021/07/28】 We fix the fatal error of [memory leak](https://github.com/Megvii-BaseDetection/YOLOX/issues/103)
 * 【2021/07/26】 We now support [MegEngine](https://github.com/Megvii-BaseDetection/YOLOX/tree/main/demo/MegEngine) deployment.
@@ -24,6 +25,18 @@ This repo is an implementation of PyTorch version YOLOX, there is also a [MegEng
 ## Benchmark
 #### Standard Models.
 |Model |size |mAP<sup>test<br>0.5:0.95 | Speed V100<br>(ms) | Params<br>(M) |FLOPs<br>(G)| weights |
 | ------        |:---: | :---:       |:---:     |:---:  | :---: | :----: |
 |[YOLOX-s](./exps/default/yolox_s.py)    |640  |39.6      |9.8     |9.0 | 26.8 | [onedrive](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EW62gmO2vnNNs5npxjzunVwB9p307qqygaCkXdTO88BLUg?e=NMTQYw)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_s.pth) |
@@ -32,11 +45,25 @@ This repo is an implementation of PyTorch version YOLOX, there is also a [MegEng
 |[YOLOX-x](./exps/default/yolox_x.py)   |640  |**51.2**      | 17.3 |99.1 |281.9 | [onedrive](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EdgVPHBziOVBtGAXHfeHI5kBza0q9yyueMGdT0wXZfI1rQ?e=tABO5u)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_x.pth) |
 |[YOLOX-Darknet53](./exps/default/yolov3.py)   |640  | 47.4      | 11.1 |63.7 | 185.3 | [onedrive](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EZ-MV1r_fMFPkPrNjvbJEMoBLOLAnXH-XKEB77w8LhXL6Q?e=mf6wOc)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_darknet53.pth) |
 #### Light Models.
 |Model |size |mAP<sup>val<br>0.5:0.95 | Params<br>(M) |FLOPs<br>(G)| weights |
 | ------        |:---:  |  :---:       |:---:     |:---:  | :---: |
-|[YOLOX-Nano](./exps/default/nano.py) |416  |25.3  | 0.91 |1.08 | [onedrive](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EdcREey-krhLtdtSnxolxiUBjWMy6EFdiaO9bdOwZ5ygCQ?e=yQpdds)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_nano.pth) |
-|[YOLOX-Tiny](./exps/default/yolox_tiny.py) |416  |32.8 | 5.06 |6.45 | [onedrive](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EbZuinX5X1dJmNy8nqSRegABWspKw3QpXxuO82YSoFN1oQ?e=Q7V7XE)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_tiny_32dot8.pth) |
 ## Quick Start
@@ -50,15 +77,8 @@ cd YOLOX
 pip3 install -U pip && pip3 install -r requirements.txt
 pip3 install -v -e .  # or  python3 setup.py develop
 ```
-Step2. Install [apex](https://github.com/NVIDIA/apex).
-```shell
-# skip this step if you don't want to train model.
-git clone https://github.com/NVIDIA/apex
-cd apex
-pip3 install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
-```
-Step3. Install [pycocotools](https://github.com/cocodataset/cocoapi).
 ```shell
 pip3 install cython; pip3 install 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
@@ -100,7 +120,7 @@ ln -s /path/to/your/COCO ./datasets/COCO
 Step2. Reproduce our results on COCO by specifying -n:
 ```shell
-python tools/train.py -n yolox-s -d 8 -b 64 --fp16 -o
                          yolox-m
                          yolox-l
                          yolox-x
@@ -108,10 +128,11 @@ python tools/train.py -n yolox-s -d 8 -b 64 --fp16 -o
 * -d: number of gpu devices
 * -b: total batch size, the recommended number for -b is num-gpu * 8
 * --fp16: mixed precision training
 When using -f, the above commands are equivalent to:
 ```shell
-python tools/train.py -f exps/default/yolox_s.py -d 8 -b 64 --fp16 -o
                          exps/default/yolox_m.py
                          exps/default/yolox_l.py
                          exps/default/yolox_x.py

 <img src="assets/git_fig.png" width="1000" >
 ## Updates!!
+* 【2021/08/19】 We optimize the training process with **2x** faster training and **~1%** higher performance! See [notes](docs/updates_note.md) for more details.
 * 【2021/08/05】 We release [MegEngine version YOLOX](https://github.com/MegEngine/YOLOX).
 * 【2021/07/28】 We fix the fatal error of [memory leak](https://github.com/Megvii-BaseDetection/YOLOX/issues/103)
 * 【2021/07/26】 We now support [MegEngine](https://github.com/Megvii-BaseDetection/YOLOX/tree/main/demo/MegEngine) deployment.
 ## Benchmark
 #### Standard Models.
+|Model |size |mAP<sup>val<br>0.5:0.95 |mAP<sup>test<br>0.5:0.95 | Speed V100<br>(ms) | Params<br>(M) |FLOPs<br>(G)| weights |
+| ------        |:---: | :---:    | :---:       |:---:     |:---:  | :---: | :----: |
+|[YOLOX-s](./exps/default/yolox_s.py)    |640  |40.5 |40.5      |9.8      |9.0 | 26.8 | [github](https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_s.pth) |
+|[YOLOX-m](./exps/default/yolox_m.py)    |640  |46.9 |47.2      |12.3     |25.3 |73.8| [github](https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_m.pth) |
+|[YOLOX-l](./exps/default/yolox_l.py)    |640  |47.7 |50.1      |14.5     |54.2| 155.6 | [github](https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_l.pth) |
+|[YOLOX-x](./exps/default/yolox_x.py)   |640   |51.1 |**51.5**  | 17.3    |99.1 |281.9 | [github](https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_x.pth) |
+|[YOLOX-Darknet53](./exps/default/yolov3.py)   |640  | 47.7 | 48.0 | 11.1 |63.7 | 185.3 | [github](https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_darknet.pth) |
+<details>
+<summary>Legacy models</summary>
 |Model |size |mAP<sup>test<br>0.5:0.95 | Speed V100<br>(ms) | Params<br>(M) |FLOPs<br>(G)| weights |
 | ------        |:---: | :---:       |:---:     |:---:  | :---: | :----: |
 |[YOLOX-s](./exps/default/yolox_s.py)    |640  |39.6      |9.8     |9.0 | 26.8 | [onedrive](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EW62gmO2vnNNs5npxjzunVwB9p307qqygaCkXdTO88BLUg?e=NMTQYw)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_s.pth) |
 |[YOLOX-x](./exps/default/yolox_x.py)   |640  |**51.2**      | 17.3 |99.1 |281.9 | [onedrive](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EdgVPHBziOVBtGAXHfeHI5kBza0q9yyueMGdT0wXZfI1rQ?e=tABO5u)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_x.pth) |
 |[YOLOX-Darknet53](./exps/default/yolov3.py)   |640  | 47.4      | 11.1 |63.7 | 185.3 | [onedrive](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EZ-MV1r_fMFPkPrNjvbJEMoBLOLAnXH-XKEB77w8LhXL6Q?e=mf6wOc)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_darknet53.pth) |
+</details>
 #### Light Models.
 |Model |size |mAP<sup>val<br>0.5:0.95 | Params<br>(M) |FLOPs<br>(G)| weights |
 | ------        |:---:  |  :---:       |:---:     |:---:  | :---: |
+|[YOLOX-Nano](./exps/default/nano.py) |416  |25.8  | 0.91 |1.08 | [github](https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_nano.pth) |
+|[YOLOX-Tiny](./exps/default/yolox_tiny.py) |416  |32.8 | 5.06 |6.45 | [github](https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_tiny.pth) |
+<details>
+<summary>Legacy models</summary>
+|Model |size |mAP<sup>val<br>0.5:0.95 | Params<br>(M) |FLOPs<br>(G)| weights |
+| ------        |:---:  |  :---:       |:---:     |:---:  | :---: |
+|[YOLOX-Nano](./exps/default/nano.py) |416  |25.3  | 0.91 |1.08 | [github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_nano.pth) |
+|[YOLOX-Tiny](./exps/default/yolox_tiny.py) |416  |32.8 | 5.06 |6.45 | [github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_tiny_32dot8.pth) |
+</details>
 ## Quick Start
 pip3 install -U pip && pip3 install -r requirements.txt
 pip3 install -v -e .  # or  python3 setup.py develop
 ```
+Step2. Install [pycocotools](https://github.com/cocodataset/cocoapi).
 ```shell
 pip3 install cython; pip3 install 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
 Step2. Reproduce our results on COCO by specifying -n:
 ```shell
+python tools/train.py -n yolox-s -d 8 -b 64 --fp16 -o [--cache]
                          yolox-m
                          yolox-l
                          yolox-x
 * -d: number of gpu devices
 * -b: total batch size, the recommended number for -b is num-gpu * 8
 * --fp16: mixed precision training
+* --cache: caching imgs into RAM to accelarate training, which need large system RAM.
 When using -f, the above commands are equivalent to:
 ```shell
+python tools/train.py -f exps/default/yolox_s.py -d 8 -b 64 --fp16 -o [--cache]
                          exps/default/yolox_m.py
                          exps/default/yolox_l.py
                          exps/default/yolox_x.py

demo/MegEngine/cpp/yolox.cpp CHANGED Viewed

@@ -35,17 +35,14 @@ cv::Mat static_resize(cv::Mat &img) {
 }
 void blobFromImage(cv::Mat &img, float *blob_data) {
-  cv::cvtColor(img, img, cv::COLOR_BGR2RGB);
   int channels = 3;
   int img_h = img.rows;
   int img_w = img.cols;
-  std::vector<float> mean = {0.485, 0.456, 0.406};
-  std::vector<float> std = {0.229, 0.224, 0.225};
   for (size_t c = 0; c < channels; c++) {
     for (size_t h = 0; h < img_h; h++) {
       for (size_t w = 0; w < img_w; w++) {
         blob_data[c * img_w * img_h + h * img_w + w] =
-            (((float)img.at<cv::Vec3b>(h, w)[c]) / 255.0f - mean[c]) / std[c];
       }
     }
   }

 }
 void blobFromImage(cv::Mat &img, float *blob_data) {
   int channels = 3;
   int img_h = img.rows;
   int img_w = img.cols;
   for (size_t c = 0; c < channels; c++) {
     for (size_t h = 0; h < img_h; h++) {
       for (size_t w = 0; w < img_w; w++) {
         blob_data[c * img_w * img_h + h * img_w + w] =
+            (float)img.at<cv::Vec3b>(h, w)[c];
       }
     }
   }

demo/MegEngine/python/demo.py CHANGED Viewed

@@ -107,8 +107,6 @@ class Predictor(object):
         self.confthre = confthre
         self.nmsthre = nmsthre
         self.test_size = test_size
-        self.rgb_means = (0.485, 0.456, 0.406)
-        self.std = (0.229, 0.224, 0.225)
     def inference(self, img):
         img_info = {"id": 0}
@@ -125,7 +123,7 @@ class Predictor(object):
         img_info["width"] = width
         img_info["raw_img"] = img
-        img, ratio = preprocess(img, self.test_size, self.rgb_means, self.std)
         img_info["ratio"] = ratio
         img = F.expand_dims(mge.tensor(img), 0)

         self.confthre = confthre
         self.nmsthre = nmsthre
         self.test_size = test_size
     def inference(self, img):
         img_info = {"id": 0}
         img_info["width"] = width
         img_info["raw_img"] = img
+        img, ratio = preprocess(img, self.test_size)
         img_info["ratio"] = ratio
         img = F.expand_dims(mge.tensor(img), 0)

demo/ONNXRuntime/README.md CHANGED Viewed

@@ -6,13 +6,13 @@ This doc introduces how to convert your pytorch model into onnx, and how to run
 | Model | Parameters | GFLOPs | Test Size | mAP | Weights |
 |:------| :----: | :----: | :---: | :---: | :---: |
-|  YOLOX-Nano |  0.91M  | 1.08 | 416x416 | 25.3 | [onedrive](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EfAGwvevU-lNhW5OqFAyHbwBJdI_7EaKu5yU04fgF5BU7w?e=gvq4hf)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_nano.onnx) |
-|  YOLOX-Tiny | 5.06M     | 6.45 | 416x416 |32.8 | [onedrive](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/ET64VPoEV8FAm5YBiEj5JXwBVn_KYHM38iJQ_lpcK2slYw?e=uuJ7Ii)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_tiny_32dot8.onnx) |
-|  YOLOX-S | 9.0M | 26.8 | 640x640 |39.6 | [onedrive](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/Ec0L1d1x2UtIpbfiahgxhtgBZVjb1NCXbotO8SCOdMqpQQ?e=siyIsK)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_s.onnx) |
-|  YOLOX-M | 25.3M | 73.8 | 640x640 |46.4 | [onedrive](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/ERUKlQe-nlxBoTKPy1ynbxsBmAZ_h-VBEV-nnfPdzUIkZQ?e=hyQQtl)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_m.onnx) |
-|  YOLOX-L | 54.2M | 155.6 | 640x640 |50.0 | [onedrive](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/ET5w926jCA5GlVfg9ixB4KEBiW0HYl7SzaHNRaRG9dYO_A?e=ISmCYX)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_l.onnx) |
-|  YOLOX-Darknet53| 63.72M | 185.3 | 640x640 |47.3 | [onedrive](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/ESArloSW-MlPlLuemLh9zKkBdovgweKbfu4zkvzKAp7pPQ?e=f81Ikw)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_darknet53.onnx) |
-|  YOLOX-X | 99.1M | 281.9 | 640x640 |51.2 | [onedrive](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/ERjqoeMJlFdGuM3tQfXQmhABmGHlIHydWCwhlugeWLE9AA)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox.onnx) |
 ### Convert Your Model to ONNX

 | Model | Parameters | GFLOPs | Test Size | mAP | Weights |
 |:------| :----: | :----: | :---: | :---: | :---: |
+|  YOLOX-Nano |  0.91M  | 1.08 | 416x416 | 25.8 |[github](https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_nano.onnx) |
+|  YOLOX-Tiny | 5.06M     | 6.45 | 416x416 |32.8 | [github](https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_tiny.onnx) |
+|  YOLOX-S | 9.0M | 26.8 | 640x640 |40.5 | [github](https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_s.onnx) |
+|  YOLOX-M | 25.3M | 73.8 | 640x640 |47.2 | [github](https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_m.onnx) |
+|  YOLOX-L | 54.2M | 155.6 | 640x640 |50.1 | [github](https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_l.onnx) |
+|  YOLOX-Darknet53| 63.72M | 185.3 | 640x640 |48.0 | [github](https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_darknet.onnx) |
+|  YOLOX-X | 99.1M | 281.9 | 640x640 |51.5 | [github](https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox.onnx) |
 ### Convert Your Model to ONNX

demo/ONNXRuntime/onnx_inference.py CHANGED Viewed

@@ -64,9 +64,7 @@ if __name__ == '__main__':
     input_shape = tuple(map(int, args.input_shape.split(',')))
     origin_img = cv2.imread(args.image_path)
-    mean = (0.485, 0.456, 0.406)
-    std = (0.229, 0.224, 0.225)
-    img, ratio = preprocess(origin_img, input_shape, mean, std)
     session = onnxruntime.InferenceSession(args.model)

     input_shape = tuple(map(int, args.input_shape.split(',')))
     origin_img = cv2.imread(args.image_path)
+    img, ratio = preprocess(origin_img, input_shape)
     session = onnxruntime.InferenceSession(args.model)

demo/OpenVINO/cpp/README.md CHANGED Viewed

@@ -6,13 +6,13 @@ This toturial includes a C++ demo for OpenVINO, as well as some converted models
 | Model | Parameters | GFLOPs | Test Size | mAP | Weights |
 |:------| :----: | :----: | :---: | :---: | :---: |
-|  [YOLOX-Nano](../../../exps/nano.py) |  0.91M  | 1.08 | 416x416 | 25.3 | [Download](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EeWY57o5wQZFtXYd1KJw6Z8B4vxZru649XxQHYIFgio3Qw?e=ZS81ce)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_nano_openvino.tar.gz) |
-|  [YOLOX-Tiny](../../../exps/yolox_tiny.py) | 5.06M     | 6.45 | 416x416 |31.7 | [Download](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/ETfvOoCXdVZNinoSpKA_sEYBIQVqfjjF5_M6VvHRnLVcsA?e=STL1pi)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_tiny_openvino.tar.gz) |
-|  [YOLOX-S](../../../exps/yolox_s.py) | 9.0M | 26.8 | 640x640 |39.6 | [Download](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EXUjf3PQnbBLrxNrXPueqaIBzVZOrYQOnJpLK1Fytj5ssA?e=GK0LOM)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_s_openvino.tar.gz) |
-|  [YOLOX-M](../../../exps/yolox_m.py) | 25.3M | 73.8 | 640x640 |46.4 | [Download](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EcoT1BPpeRpLvE_4c441zn8BVNCQ2naxDH3rho7WqdlgLQ?e=95VaM9)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_m_openvino.tar.gz) |
-|  [YOLOX-L](../../../exps/yolox_l.py) | 54.2M | 155.6 | 640x640 |50.0 | [Download](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EZvmn-YLRuVPh0GAP_w3xHMB2VGvrKqQXyK_Cv5yi_DXUg?e=YRh6Eq)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_l_openvino.tar.gz) |
-|  [YOLOX-Darknet53](../../../exps/yolov3.py) | 63.72M | 185.3 | 640x640 |47.3 | [Download](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EQP8LSroikFHuwX0jFRetmcBOCDWSFmylHxolV7ezUPXGw?e=bEw5iq)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_darknet53_openvino.tar.gz) |
-|  [YOLOX-X](../../../exps/yolox_x.py) | 99.1M | 281.9 | 640x640 |51.2 | [Download](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EZFPnLqiD-xIlt7rcZYDjQgB4YXE9wnq1qaSXQwJrsKbdg?e=83nwEz)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_x_openvino.tar.gz) |
 ## Install OpenVINO Toolkit
@@ -72,9 +72,11 @@ source ~/.bashrc
    ```
    For example:
    ```shell
-   python3 mo.py --input_model yolox.onnx --input_shape (1,3,640,640) --data_type FP16
    ```
 ## Build
 ### Linux

 | Model | Parameters | GFLOPs | Test Size | mAP | Weights |
 |:------| :----: | :----: | :---: | :---: | :---: |
+|  [YOLOX-Nano](../../../exps/default/nano.py) |  0.91M  | 1.08 | 416x416 | 25.8 | [github](https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_nano_openvino.tar.gz) |
+|  [YOLOX-Tiny](../../../exps/default/yolox_tiny.py) | 5.06M     | 6.45 | 416x416 |32.8 | [github](https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_tiny_openvino.tar.gz) |
+|  [YOLOX-S](../../../exps/default/yolox_s.py) | 9.0M | 26.8 | 640x640 |40.5 | [github](https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_s_openvino.tar.gz) |
+|  [YOLOX-M](../../../exps/default/yolox_m.py) | 25.3M | 73.8 | 640x640 |47.2 | [github](https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_m_openvino.tar.gz) |
+|  [YOLOX-L](../../../exps/default/yolox_l.py) | 54.2M | 155.6 | 640x640 |50.1 | [github](https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_l_openvino.tar.gz) |
+|  [YOLOX-Darknet53](../../../exps/default/yolov3.py) | 63.72M | 185.3 | 640x640 |48.0 | [github](https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_dark_openvino.tar.gz) |
+|  [YOLOX-X](../../../exps/default/yolox_x.py) | 99.1M | 281.9 | 640x640 |51.5 | [github](https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_x_openvino.tar.gz) |
 ## Install OpenVINO Toolkit
    ```
    For example:
    ```shell
+   python3 mo.py --input_model yolox_tiny.onnx --input_shape [1,3,416,416] --data_type FP16
    ```
+   Make sure the input shape is consistent with [those](yolox_openvino.cpp#L24-L25) in cpp file.
 ## Build
 ### Linux

demo/OpenVINO/cpp/yolox_openvino.cpp CHANGED Viewed

@@ -37,12 +37,9 @@ cv::Mat static_resize(cv::Mat& img) {
 }
 void blobFromImage(cv::Mat& img, Blob::Ptr& blob){
-    cv::cvtColor(img, img, cv::COLOR_BGR2RGB);
     int channels = 3;
     int img_h = img.rows;
     int img_w = img.cols;
-    std::vector<float> mean = {0.485, 0.456, 0.406};
-    std::vector<float> std = {0.229, 0.224, 0.225};
     InferenceEngine::MemoryBlob::Ptr mblob = InferenceEngine::as<InferenceEngine::MemoryBlob>(blob);
     if (!mblob)
     {
@@ -61,7 +58,7 @@ void blobFromImage(cv::Mat& img, Blob::Ptr& blob){
             for (size_t w = 0; w < img_w; w++)
             {
                 blob_data[c * img_w * img_h + h * img_w + w] =
-                    (((float)img.at<cv::Vec3b>(h, w)[c]) / 255.0f - mean[c]) / std[c];
             }
         }
     }
@@ -513,7 +510,6 @@ int main(int argc, char* argv[]) {
         auto moutputHolder = moutput->rmap();
         const float* net_pred = moutputHolder.as<const PrecisionTrait<Precision::FP32>::value_type*>();
-        const int image_size = 416;
 	    int img_w = image.cols;
         int img_h = image.rows;
 	    float scale = std::min(INPUT_W / (image.cols*1.0), INPUT_H / (image.rows*1.0));

 }
 void blobFromImage(cv::Mat& img, Blob::Ptr& blob){
     int channels = 3;
     int img_h = img.rows;
     int img_w = img.cols;
     InferenceEngine::MemoryBlob::Ptr mblob = InferenceEngine::as<InferenceEngine::MemoryBlob>(blob);
     if (!mblob)
     {
             for (size_t w = 0; w < img_w; w++)
             {
                 blob_data[c * img_w * img_h + h * img_w + w] =
+                    (float)img.at<cv::Vec3b>(h, w)[c];
             }
         }
     }
         auto moutputHolder = moutput->rmap();
         const float* net_pred = moutputHolder.as<const PrecisionTrait<Precision::FP32>::value_type*>();
 	    int img_w = image.cols;
         int img_h = image.rows;
 	    float scale = std::min(INPUT_W / (image.cols*1.0), INPUT_H / (image.rows*1.0));

demo/OpenVINO/python/README.md CHANGED Viewed

@@ -6,13 +6,13 @@ This toturial includes a Python demo for OpenVINO, as well as some converted mod
 | Model | Parameters | GFLOPs | Test Size | mAP | Weights |
 |:------| :----: | :----: | :---: | :---: | :---: |
-|  [YOLOX-Nano](../../../exps/default/nano.py) |  0.91M  | 1.08 | 416x416 | 25.3 | [onedrive](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EeWY57o5wQZFtXYd1KJw6Z8B4vxZru649XxQHYIFgio3Qw?e=ZS81ce)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_nano_openvino.tar.gz) |
-|  [YOLOX-Tiny](../../../exps/default/yolox_tiny.py) | 5.06M     | 6.45 | 416x416 |31.7 | [onedrive](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/ETfvOoCXdVZNinoSpKA_sEYBIQVqfjjF5_M6VvHRnLVcsA?e=STL1pi)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_tiny_openvino.tar.gz) |
-|  [YOLOX-S](../../../exps/default/yolox_s.py) | 9.0M | 26.8 | 640x640 |39.6 | [onedrive](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EXUjf3PQnbBLrxNrXPueqaIBzVZOrYQOnJpLK1Fytj5ssA?e=GK0LOM)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_s_openvino.tar.gz) |
-|  [YOLOX-M](../../../exps/default/yolox_m.py) | 25.3M | 73.8 | 640x640 |46.4 | [onedrive](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EcoT1BPpeRpLvE_4c441zn8BVNCQ2naxDH3rho7WqdlgLQ?e=95VaM9)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_m_openvino.tar.gz) |
-|  [YOLOX-L](../../../exps/default/yolox_l.py) | 54.2M | 155.6 | 640x640 |50.0 | [onedrive](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EZvmn-YLRuVPh0GAP_w3xHMB2VGvrKqQXyK_Cv5yi_DXUg?e=YRh6Eq)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_l_openvino.tar.gz) |
-|  [YOLOX-Darknet53](../../../exps/default/yolov3.py) | 63.72M | 185.3 | 640x640 |47.3 | [onedrive](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EQP8LSroikFHuwX0jFRetmcBOCDWSFmylHxolV7ezUPXGw?e=bEw5iq)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_darknet53_openvino.tar.gz) |
-|  [YOLOX-X](../../../exps/default/yolox_x.py) | 99.1M | 281.9 | 640x640 |51.2 | [onedrive](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EZFPnLqiD-xIlt7rcZYDjQgB4YXE9wnq1qaSXQwJrsKbdg?e=83nwEz)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_x_openvino.tar.gz) |
 ## Install OpenVINO Toolkit

 | Model | Parameters | GFLOPs | Test Size | mAP | Weights |
 |:------| :----: | :----: | :---: | :---: | :---: |
+|  [YOLOX-Nano](../../../exps/default/nano.py) |  0.91M  | 1.08 | 416x416 | 25.8 | [github](https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_nano_openvino.tar.gz) |
+|  [YOLOX-Tiny](../../../exps/default/yolox_tiny.py) | 5.06M     | 6.45 | 416x416 |32.8 | [github](https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_tiny_openvino.tar.gz) |
+|  [YOLOX-S](../../../exps/default/yolox_s.py) | 9.0M | 26.8 | 640x640 |40.5 | [github](https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_s_openvino.tar.gz) |
+|  [YOLOX-M](../../../exps/default/yolox_m.py) | 25.3M | 73.8 | 640x640 |47.2 | [github](https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_m_openvino.tar.gz) |
+|  [YOLOX-L](../../../exps/default/yolox_l.py) | 54.2M | 155.6 | 640x640 |50.1 | [github](https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_l_openvino.tar.gz) |
+|  [YOLOX-Darknet53](../../../exps/default/yolov3.py) | 63.72M | 185.3 | 640x640 |48.0 | [github](https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_dark_openvino.tar.gz) |
+|  [YOLOX-X](../../../exps/default/yolox_x.py) | 99.1M | 281.9 | 640x640 |51.5 | [github](https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_x_openvino.tar.gz) |
 ## Install OpenVINO Toolkit

demo/OpenVINO/python/openvino_inference.py CHANGED Viewed

@@ -119,9 +119,7 @@ def main():
     # ---------------------------Step 6. Prepare input---------------------------------------------------------------------
     origin_img = cv2.imread(args.input)
     _, _, h, w = net.input_info[input_blob].input_data.shape
-    mean = (0.485, 0.456, 0.406)
-    std = (0.229, 0.224, 0.225)
-    image, ratio = preprocess(origin_img, (h, w), mean, std)
     # ---------------------------Step 7. Do inference----------------------------------------------------------------------
     log.info('Starting inference in synchronous mode')

     # ---------------------------Step 6. Prepare input---------------------------------------------------------------------
     origin_img = cv2.imread(args.input)
     _, _, h, w = net.input_info[input_blob].input_data.shape
+    image, ratio = preprocess(origin_img, (h, w))
     # ---------------------------Step 7. Do inference----------------------------------------------------------------------
     log.info('Starting inference in synchronous mode')

demo/TensorRT/cpp/yolox.cpp CHANGED Viewed

@@ -207,14 +207,10 @@ static void generate_yolox_proposals(std::vector<GridAndStride> grid_strides, fl
 }
 float* blobFromImage(cv::Mat& img){
-    cv::cvtColor(img, img, cv::COLOR_BGR2RGB);
     float* blob = new float[img.total()*3];
     int channels = 3;
     int img_h = img.rows;
     int img_w = img.cols;
-    std::vector<float> mean = {0.485, 0.456, 0.406};
-    std::vector<float> std = {0.229, 0.224, 0.225};
     for (size_t c = 0; c < channels; c++)
     {
         for (size_t  h = 0; h < img_h; h++)
@@ -222,7 +218,7 @@ float* blobFromImage(cv::Mat& img){
             for (size_t w = 0; w < img_w; w++)
             {
                 blob[c * img_w * img_h + h * img_w + w] =
-                    (((float)img.at<cv::Vec3b>(h, w)[c]) / 255.0f - mean[c]) / std[c];
             }
         }
     }

 }
 float* blobFromImage(cv::Mat& img){
     float* blob = new float[img.total()*3];
     int channels = 3;
     int img_h = img.rows;
     int img_w = img.cols;
     for (size_t c = 0; c < channels; c++)
     {
         for (size_t  h = 0; h < img_h; h++)
             for (size_t w = 0; w < img_w; w++)
             {
                 blob[c * img_w * img_h + h * img_w + w] =
+                    (float)img.at<cv::Vec3b>(h, w)[c];
             }
         }
     }

demo/ncnn/cpp/yolox.cpp CHANGED Viewed

@@ -279,7 +279,7 @@ static int detect_yolox(const cv::Mat& bgr, std::vector<Object>& objects)
         h = YOLOX_TARGET_SIZE;
         w = w * scale;
     }
-    ncnn::Mat in = ncnn::Mat::from_pixels_resize(bgr.data, ncnn::Mat::PIXEL_BGR2RGB, img_w, img_h, w, h);
     // pad to YOLOX_TARGET_SIZE rectangle
     int wpad = YOLOX_TARGET_SIZE - w;
@@ -289,13 +289,6 @@ static int detect_yolox(const cv::Mat& bgr, std::vector<Object>& objects)
     // which means users don't need to extra padding info to decode boxes coordinate.
     ncnn::copy_make_border(in, in_pad, 0, hpad, 0, wpad, ncnn::BORDER_CONSTANT, 114.f);
-    // python 0-1 input tensor with rgb_means = (0.485, 0.456, 0.406), std = (0.229, 0.224, 0.225)
-    // so for 0-255 input image, rgb_mean should multiply 255 and norm should div by std.
-    const float mean_vals[3] = {255.f * 0.485f, 255.f * 0.456, 255.f * 0.406f};
-    const float norm_vals[3] = {1 / (255.f * 0.229f), 1 / (255.f * 0.224f), 1 / (255.f * 0.225f)};
-    in_pad.substract_mean_normalize(mean_vals, norm_vals);
     ncnn::Extractor ex = yolox.create_extractor();
     ex.input("images", in_pad);

         h = YOLOX_TARGET_SIZE;
         w = w * scale;
     }
+    ncnn::Mat in = ncnn::Mat::from_pixels_resize(bgr.data, ncnn::Mat::PIXEL_BGR, img_w, img_h, w, h);
     // pad to YOLOX_TARGET_SIZE rectangle
     int wpad = YOLOX_TARGET_SIZE - w;
     // which means users don't need to extra padding info to decode boxes coordinate.
     ncnn::copy_make_border(in, in_pad, 0, hpad, 0, wpad, ncnn::BORDER_CONSTANT, 114.f);
     ncnn::Extractor ex = yolox.create_extractor();
     ex.input("images", in_pad);

docs/model_zoo.md CHANGED Viewed

@@ -2,17 +2,41 @@
 ## Standard Models.
 |Model |size |mAP<sup>test<br>0.5:0.95 | Speed V100<br>(ms) | Params<br>(M) |FLOPs<br>(G)| weights |
 | ------        |:---: | :---:       |:---:     |:---:  | :---: | :----: |
-|[YOLOX-s](https://github.com/Megvii-BaseDetection/YOLOX/blob/main/exps/default/yolox_s.py)    |640  |39.6      |9.8     |9.0 | 26.8 | [onedrive](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EW62gmO2vnNNs5npxjzunVwB9p307qqygaCkXdTO88BLUg?e=NMTQYw)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_s.pth) |
-|[YOLOX-m](https://github.com/Megvii-BaseDetection/YOLOX/blob/main/exps/default/yolox_m.py)    |640  |46.4      |12.3     |25.3 |73.8| [onedrive](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/ERMTP7VFqrVBrXKMU7Vl4TcBQs0SUeCT7kvc-JdIbej4tQ?e=1MDo9y)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_m.pth) |
-|[YOLOX-l](https://github.com/Megvii-BaseDetection/YOLOX/blob/main/exps/default/yolox_l.py)    |640  |50.0  |14.5 |54.2| 155.6 | [onedrive](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EWA8w_IEOzBKvuueBqfaZh0BeoG5sVzR-XYbOJO4YlOkRw?e=wHWOBE)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_l.pth) |
-|[YOLOX-x](https://github.com/Megvii-BaseDetection/YOLOX/blob/main/exps/default/yolox_x.py)   |640  |**51.2**      | 17.3 |99.1 |281.9 | [onedrive](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EdgVPHBziOVBtGAXHfeHI5kBza0q9yyueMGdT0wXZfI1rQ?e=tABO5u)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_x.pth) |
-|[YOLOX-Darknet53](https://github.com/Megvii-BaseDetection/YOLOX/blob/main/exps/default/yolov3.py)   |640  | 47.4      | 11.1 |63.7 | 185.3 | [onedrive](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EZ-MV1r_fMFPkPrNjvbJEMoBLOLAnXH-XKEB77w8LhXL6Q?e=mf6wOc)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_darknet53.pth) |
 ## Light Models.
 |Model |size |mAP<sup>val<br>0.5:0.95 | Params<br>(M) |FLOPs<br>(G)| weights |
 | ------        |:---:  |  :---:       |:---:     |:---:  | :---: |
-|[YOLOX-Nano](https://github.com/Megvii-BaseDetection/YOLOX/blob/main/exps/default/nano.py) |416  |25.3  | 0.91 |1.08 | [onedrive](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EdcREey-krhLtdtSnxolxiUBjWMy6EFdiaO9bdOwZ5ygCQ?e=yQpdds)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_nano.pth) |
-|[YOLOX-Tiny](https://github.com/Megvii-BaseDetection/YOLOX/blob/main/exps/default/yolox_tiny.py) |416  |32.8 | 5.06 |6.45 | [onedrive](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EbZuinX5X1dJmNy8nqSRegABWspKw3QpXxuO82YSoFN1oQ?e=Q7V7XE)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_tiny_32dot8.pth) |

 ## Standard Models.
+|Model |size |mAP<sup>val<br>0.5:0.95 |mAP<sup>test<br>0.5:0.95 | Speed V100<br>(ms) | Params<br>(M) |FLOPs<br>(G)| weights |
+| ------        |:---: | :---:    | :---:       |:---:     |:---:  | :---: | :----: |
+|[YOLOX-s](./exps/default/yolox_s.py)    |640  |40.5 |40.5      |9.8      |9.0 | 26.8 | [github](https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_s.pth) |
+|[YOLOX-m](./exps/default/yolox_m.py)    |640  |46.9 |47.2      |12.3     |25.3 |73.8| [github](https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_m.pth) |
+|[YOLOX-l](./exps/default/yolox_l.py)    |640  |47.7 |50.1      |14.5     |54.2| 155.6 | [github](https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_l.pth) |
+|[YOLOX-x](./exps/default/yolox_x.py)   |640   |51.1 |**51.5**  | 17.3    |99.1 |281.9 | [github](https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_x.pth) |
+|[YOLOX-Darknet53](./exps/default/yolov3.py)   |640  | 47.7 | 48.0 | 11.1 |63.7 | 185.3 | [github](https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_darknet.pth)
+<details>
+<summary>Legacy models</summary>
 |Model |size |mAP<sup>test<br>0.5:0.95 | Speed V100<br>(ms) | Params<br>(M) |FLOPs<br>(G)| weights |
 | ------        |:---: | :---:       |:---:     |:---:  | :---: | :----: |
+|[YOLOX-s](./exps/default/yolox_s.py)    |640  |39.6      |9.8     |9.0 | 26.8 | [onedrive](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EW62gmO2vnNNs5npxjzunVwB9p307qqygaCkXdTO88BLUg?e=NMTQYw)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_s.pth) |
+|[YOLOX-m](./exps/default/yolox_m.py)    |640  |46.4      |12.3     |25.3 |73.8| [onedrive](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/ERMTP7VFqrVBrXKMU7Vl4TcBQs0SUeCT7kvc-JdIbej4tQ?e=1MDo9y)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_m.pth) |
+|[YOLOX-l](./exps/default/yolox_l.py)    |640  |50.0  |14.5 |54.2| 155.6 | [onedrive](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EWA8w_IEOzBKvuueBqfaZh0BeoG5sVzR-XYbOJO4YlOkRw?e=wHWOBE)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_l.pth) |
+|[YOLOX-x](./exps/default/yolox_x.py)   |640  |**51.2**      | 17.3 |99.1 |281.9 | [onedrive](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EdgVPHBziOVBtGAXHfeHI5kBza0q9yyueMGdT0wXZfI1rQ?e=tABO5u)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_x.pth) |
+|[YOLOX-Darknet53](./exps/default/yolov3.py)   |640  | 47.4      | 11.1 |63.7 | 185.3 | [onedrive](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EZ-MV1r_fMFPkPrNjvbJEMoBLOLAnXH-XKEB77w8LhXL6Q?e=mf6wOc)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_darknet53.pth) |
+</details>
 ## Light Models.
 |Model |size |mAP<sup>val<br>0.5:0.95 | Params<br>(M) |FLOPs<br>(G)| weights |
 | ------        |:---:  |  :---:       |:---:     |:---:  | :---: |
+|[YOLOX-Nano](./exps/default/nano.py) |416  |25.8  | 0.91 |1.08 | [github](https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_tiny.pth) |
+|[YOLOX-Tiny](./exps/default/yolox_tiny.py) |416  |32.8 | 5.06 |6.45 | [github](https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_nano.pth) |
+<details>
+<summary>Legacy models</summary>
+|Model |size |mAP<sup>val<br>0.5:0.95 | Params<br>(M) |FLOPs<br>(G)| weights |
+| ------        |:---:  |  :---:       |:---:     |:---:  | :---: |
+|[YOLOX-Nano](./exps/default/nano.py) |416  |25.3  | 0.91 |1.08 | [onedrive](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EdcREey-krhLtdtSnxolxiUBjWMy6EFdiaO9bdOwZ5ygCQ?e=yQpdds)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_nano.pth) |
+|[YOLOX-Tiny](./exps/default/yolox_tiny.py) |416  |32.8 | 5.06 |6.45 | [onedrive](https://megvii-my.sharepoint.cn/:u:/g/personal/gezheng_megvii_com/EbZuinX5X1dJmNy8nqSRegABWspKw3QpXxuO82YSoFN1oQ?e=Q7V7XE)/[github](https://github.com/Megvii-BaseDetection/storage/releases/download/0.0.1/yolox_tiny_32dot8.pth) |
+</details>

docs/quick_run.md CHANGED Viewed

@@ -10,15 +10,7 @@ cd YOLOX
 pip3 install -U pip && pip3 install -r requirements.txt
 pip3 install -v -e .  # or  python3 setup.py develop
 ```
-Step2. Install [apex](https://github.com/NVIDIA/apex).
-```shell
-# skip this step if you don't want to train model.
-git clone https://github.com/NVIDIA/apex
-cd apex
-pip3 install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
-```
-Step3. Install [pycocotools](https://github.com/cocodataset/cocoapi).
 ```shell
 pip3 install cython; pip3 install 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
@@ -54,7 +46,7 @@ ln -s /path/to/your/COCO ./datasets/COCO
 Step2. Reproduce our results on COCO by specifying -n:
 ```shell
-python tools/train.py -n yolox-s -d 8 -b 64 --fp16 -o
                          yolox-m
                          yolox-l
                          yolox-x
@@ -62,6 +54,7 @@ python tools/train.py -n yolox-s -d 8 -b 64 --fp16 -o
 * -d: number of gpu devices
 * -b: total batch size, the recommended number for -b is num-gpu * 8
 * --fp16: mixed precision training
 **Multi Machine Training**
@@ -72,7 +65,7 @@ We also support multi-nodes training. Just add the following args:
 When using -f, the above commands are equivalent to:
 ```shell
-python tools/train.py -f exps/default/yolox-s.py -d 8 -b 64 --fp16 -o
                          exps/default/yolox-m.py
                          exps/default/yolox-l.py
                          exps/default/yolox-x.py

 pip3 install -U pip && pip3 install -r requirements.txt
 pip3 install -v -e .  # or  python3 setup.py develop
 ```
+Step2. Install [pycocotools](https://github.com/cocodataset/cocoapi).
 ```shell
 pip3 install cython; pip3 install 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
 Step2. Reproduce our results on COCO by specifying -n:
 ```shell
+python tools/train.py -n yolox-s -d 8 -b 64 --fp16 -o [--cache]
                          yolox-m
                          yolox-l
                          yolox-x
 * -d: number of gpu devices
 * -b: total batch size, the recommended number for -b is num-gpu * 8
 * --fp16: mixed precision training
+* --cache: caching imgs into RAM to accelarate training, which need large system RAM.
 **Multi Machine Training**
 When using -f, the above commands are equivalent to:
 ```shell
+python tools/train.py -f exps/default/yolox-s.py -d 8 -b 64 --fp16 -o [--cache]
                          exps/default/yolox-m.py
                          exps/default/yolox-l.py
                          exps/default/yolox-x.py

docs/train_custom_data.md CHANGED Viewed

@@ -69,12 +69,13 @@ Except special cases, we always recommend to use our [COCO pretrained weights](h
 Once you get the Exp file and the COCO pretrained weights we provided, you can train your own model by the following below command:
 ```bash
-python tools/train.py -f /path/to/your/Exp/file -d 8 -b 64 --fp16 -o -c /path/to/the/pretrained/weights
 ```
 or take the `YOLOX-S` VOC training for example:
 ```bash
-python tools/train.py -f exps/example/yolox_voc/yolox_voc_s.py -d 8 -b 64 --fp16 -o -c /path/to/yolox_s.pth
 ```
 ✧✧✧ For example:

 Once you get the Exp file and the COCO pretrained weights we provided, you can train your own model by the following below command:
 ```bash
+python tools/train.py -f /path/to/your/Exp/file -d 8 -b 64 --fp16 -o -c /path/to/the/pretrained/weights [--cache]
 ```
+* --cache: we now support RAM caching to speed up training! Make sure you have enough system RAM when adopting it.
 or take the `YOLOX-S` VOC training for example:
 ```bash
+python tools/train.py -f exps/example/yolox_voc/yolox_voc_s.py -d 8 -b 64 --fp16 -o -c /path/to/yolox_s.pth [--cache]
 ```
 ✧✧✧ For example:

docs/updates_note.md ADDED Viewed

	@@ -0,0 +1,55 @@

+# Updates notes
+## 【2021/08/19】
+* Support image caching for faster training, which requires large system RAM.
+* Remove the dependence of apex and support torch amp training.
+* Optimize the preprocessing for faster training
+* Replace the older distort augmentation with new HSV aug for faster training and better performance.
+### 2X Faster training
+We optimize the data preprocess and support image caching with `--cache` flag:
+```shell
+python tools/train.py -n yolox-s -d 8 -b 64 --fp16 -o [--cache]
+                         yolox-m
+                         yolox-l
+                         yolox-x
+```
+* -d: number of gpu devices
+* -b: total batch size, the recommended number for -b is num-gpu * 8
+* --fp16: mixed precision training
+* --cache: caching imgs into RAM to accelarate training, which need large system RAM.
+### Higher performance
+New models achive **~1%** higher performance! See [Model_Zoo](model_zoo.md) for more details.
+### Support torch amp
+We now support torch.cuda.amp training and Apex is not used anymore.
+### Breaking changes
+We remove the normalization operation like -mean/std. This will make the old weights **incompatible**.
+If you still want to use old weights, you can add `--legacy' in demo and eval:
+```shell
+python tools/demo.py image -n yolox-s -c /path/to/your/yolox_s.pth --path assets/dog.jpg --conf 0.25 --nms 0.45 --tsize 640 --save_result --device [cpu/gpu] [--legacy]
+```
+and
+```shell
+python tools/eval.py -n  yolox-s -c yolox_s.pth -b 64 -d 8 --conf 0.001 [--fp16] [--fuse] [--legacy]
+                         yolox-m
+                         yolox-l
+                         yolox-x
+```
+But for deployment demo, we don't suppor the old weights anymore. Users could checkout to YOLOX version 0.1.0 to use legacy weights for deployment

exps/default/nano.py CHANGED Viewed

@@ -17,8 +17,9 @@ class Exp(MyExp):
         self.scale = (0.5, 1.5)
         self.random_size = (10, 20)
         self.test_size = (416, 416)
-        self.exp_name = os.path.split(os.path.realpath(__file__))[1].split(".")[0]
         self.enable_mixup = False
     def get_model(self, sublinear=False):

         self.scale = (0.5, 1.5)
         self.random_size = (10, 20)
         self.test_size = (416, 416)
+        self.mosaic_prob = 0.5
         self.enable_mixup = False
+        self.exp_name = os.path.split(os.path.realpath(__file__))[1].split(".")[0]
     def get_model(self, sublinear=False):

exps/example/yolox_voc/yolox_voc_s.py CHANGED Viewed

@@ -16,7 +16,7 @@ class Exp(MyExp):
         self.width = 0.50
         self.exp_name = os.path.split(os.path.realpath(__file__))[1].split(".")[0]
-    def get_data_loader(self, batch_size, is_distributed, no_aug=False):
         from yolox.data import (
             VOCDetection,
             TrainTransform,
@@ -24,34 +24,36 @@ class Exp(MyExp):
             DataLoader,
             InfiniteSampler,
             MosaicDetection,
         )
-        dataset = VOCDetection(
-            data_dir=os.path.join(get_yolox_datadir(), "VOCdevkit"),
-            image_sets=[('2007', 'trainval'), ('2012', 'trainval')],
-            img_size=self.input_size,
-            preproc=TrainTransform(
-                rgb_means=(0.485, 0.456, 0.406),
-                std=(0.229, 0.224, 0.225),
-                max_labels=50,
-            ),
         )
         dataset = MosaicDetection(
             dataset,
             mosaic=not no_aug,
             img_size=self.input_size,
-            preproc=TrainTransform(
-                rgb_means=(0.485, 0.456, 0.406),
-                std=(0.229, 0.224, 0.225),
-                max_labels=120,
-            ),
             degrees=self.degrees,
             translate=self.translate,
             scale=self.scale,
             shear=self.shear,
             perspective=self.perspective,
             enable_mixup=self.enable_mixup,
         )
         self.dataset = dataset
@@ -67,27 +69,27 @@ class Exp(MyExp):
             sampler=sampler,
             batch_size=batch_size,
             drop_last=False,
-            input_dimension=self.input_size,
             mosaic=not no_aug,
         )
         dataloader_kwargs = {"num_workers": self.data_num_workers, "pin_memory": True}
         dataloader_kwargs["batch_sampler"] = batch_sampler
         train_loader = DataLoader(self.dataset, **dataloader_kwargs)
         return train_loader
-    def get_eval_loader(self, batch_size, is_distributed, testdev=False):
         from yolox.data import VOCDetection, ValTransform
         valdataset = VOCDetection(
             data_dir=os.path.join(get_yolox_datadir(), "VOCdevkit"),
             image_sets=[('2007', 'test')],
             img_size=self.test_size,
-            preproc=ValTransform(
-                rgb_means=(0.485, 0.456, 0.406),
-                std=(0.229, 0.224, 0.225),
-            ),
         )
         if is_distributed:
@@ -108,10 +110,10 @@ class Exp(MyExp):
         return val_loader
-    def get_evaluator(self, batch_size, is_distributed, testdev=False):
         from yolox.evaluators import VOCEvaluator
-        val_loader = self.get_eval_loader(batch_size, is_distributed, testdev=testdev)
         evaluator = VOCEvaluator(
             dataloader=val_loader,
             img_size=self.test_size,

         self.width = 0.50
         self.exp_name = os.path.split(os.path.realpath(__file__))[1].split(".")[0]
+    def get_data_loader(self, batch_size, is_distributed, no_aug=False, cache_img=False):
         from yolox.data import (
             VOCDetection,
             TrainTransform,
             DataLoader,
             InfiniteSampler,
             MosaicDetection,
+            worker_init_reset_seed,
         )
+        from yolox.utils import (
+            wait_for_the_master,
+            get_local_rank,
         )
+        local_rank = get_local_rank()
+        with wait_for_the_master(local_rank):
+            dataset = VOCDetection(
+                data_dir=os.path.join(get_yolox_datadir(), "VOCdevkit"),
+                image_sets=[('2007', 'trainval'), ('2012', 'trainval')],
+                img_size=self.input_size,
+                preproc=TrainTransform(max_labels=50),
+                cache=cache_img,
+            )
         dataset = MosaicDetection(
             dataset,
             mosaic=not no_aug,
             img_size=self.input_size,
+            preproc=TrainTransform(max_labels=120),
             degrees=self.degrees,
             translate=self.translate,
             scale=self.scale,
             shear=self.shear,
             perspective=self.perspective,
             enable_mixup=self.enable_mixup,
+            mosaic_prob=self.mosaic_prob,
+            mixup_prob=self.mixup_prob,
         )
         self.dataset = dataset
             sampler=sampler,
             batch_size=batch_size,
             drop_last=False,
             mosaic=not no_aug,
         )
         dataloader_kwargs = {"num_workers": self.data_num_workers, "pin_memory": True}
         dataloader_kwargs["batch_sampler"] = batch_sampler
+        # Make sure each process has different random seed, especially for 'fork' method
+        dataloader_kwargs["worker_init_fn"] = worker_init_reset_seed
         train_loader = DataLoader(self.dataset, **dataloader_kwargs)
         return train_loader
+    def get_eval_loader(self, batch_size, is_distributed, testdev=False, legacy=False):
         from yolox.data import VOCDetection, ValTransform
         valdataset = VOCDetection(
             data_dir=os.path.join(get_yolox_datadir(), "VOCdevkit"),
             image_sets=[('2007', 'test')],
             img_size=self.test_size,
+            preproc=ValTransform(legacy=legacy),
         )
         if is_distributed:
         return val_loader
+    def get_evaluator(self, batch_size, is_distributed, testdev=False, legacy=False):
         from yolox.evaluators import VOCEvaluator
+        val_loader = self.get_eval_loader(batch_size, is_distributed, testdev, legacy)
         evaluator = VOCEvaluator(
             dataloader=val_loader,
             img_size=self.test_size,

tools/demo.py CHANGED Viewed

@@ -11,7 +11,7 @@ import cv2
 import torch
-from yolox.data.data_augment import preproc
 from yolox.data.datasets import COCO_CLASSES
 from yolox.exp import get_exp
 from yolox.utils import fuse_model, get_model_info, postprocess, vis
@@ -52,8 +52,8 @@ def make_parser():
         type=str,
         help="device to run our model, can either be cpu or gpu",
     )
-    parser.add_argument("--conf", default=None, type=float, help="test conf")
-    parser.add_argument("--nms", default=None, type=float, help="test nms threshold")
     parser.add_argument("--tsize", default=None, type=int, help="test img size")
     parser.add_argument(
         "--fp16",
@@ -62,6 +62,13 @@ def make_parser():
         action="store_true",
         help="Adopting mix precision evaluating.",
     )
     parser.add_argument(
         "--fuse",
         dest="fuse",
@@ -99,6 +106,7 @@ class Predictor(object):
         trt_file=None,
         decoder=None,
         device="cpu",
     ):
         self.model = model
         self.cls_names = cls_names
@@ -108,6 +116,7 @@ class Predictor(object):
         self.nmsthre = exp.nmsthre
         self.test_size = exp.test_size
         self.device = device
         if trt_file is not None:
             from torch2trt import TRTModule
@@ -117,8 +126,6 @@ class Predictor(object):
             x = torch.ones(1, 3, exp.test_size[0], exp.test_size[1]).cuda()
             self.model(x)
             self.model = model_trt
-        self.rgb_means = (0.485, 0.456, 0.406)
-        self.std = (0.229, 0.224, 0.225)
     def inference(self, img):
         img_info = {"id": 0}
@@ -133,8 +140,10 @@ class Predictor(object):
         img_info["width"] = width
         img_info["raw_img"] = img
-        img, ratio = preproc(img, self.test_size, self.rgb_means, self.std)
         img_info["ratio"] = ratio
         img = torch.from_numpy(img).unsqueeze(0)
         if self.device == "gpu":
             img = img.cuda()
@@ -229,6 +238,7 @@ def main(exp, args):
     file_name = os.path.join(exp.output_dir, args.experiment_name)
     os.makedirs(file_name, exist_ok=True)
     if args.save_result:
         vis_folder = os.path.join(file_name, "vis_res")
         os.makedirs(vis_folder, exist_ok=True)
@@ -280,7 +290,7 @@ def main(exp, args):
         trt_file = None
         decoder = None
-    predictor = Predictor(model, exp, COCO_CLASSES, trt_file, decoder, args.device)
     current_time = time.localtime()
     if args.demo == "image":
         image_demo(predictor, vis_folder, args.path, current_time, args.save_result)

 import torch
+from yolox.data.data_augment import ValTransform
 from yolox.data.datasets import COCO_CLASSES
 from yolox.exp import get_exp
 from yolox.utils import fuse_model, get_model_info, postprocess, vis
         type=str,
         help="device to run our model, can either be cpu or gpu",
     )
+    parser.add_argument("--conf", default=0.3, type=float, help="test conf")
+    parser.add_argument("--nms", default=0.3, type=float, help="test nms threshold")
     parser.add_argument("--tsize", default=None, type=int, help="test img size")
     parser.add_argument(
         "--fp16",
         action="store_true",
         help="Adopting mix precision evaluating.",
     )
+    parser.add_argument(
+        "--legacy",
+        dest="legacy",
+        default=False,
+        action="store_true",
+        help="To be compatible with older versions",
+    )
     parser.add_argument(
         "--fuse",
         dest="fuse",
         trt_file=None,
         decoder=None,
         device="cpu",
+        legacy=False,
     ):
         self.model = model
         self.cls_names = cls_names
         self.nmsthre = exp.nmsthre
         self.test_size = exp.test_size
         self.device = device
+        self.preproc = ValTransform(legacy=legacy)
         if trt_file is not None:
             from torch2trt import TRTModule
             x = torch.ones(1, 3, exp.test_size[0], exp.test_size[1]).cuda()
             self.model(x)
             self.model = model_trt
     def inference(self, img):
         img_info = {"id": 0}
         img_info["width"] = width
         img_info["raw_img"] = img
+        ratio = min(self.test_size[0] / img.shape[0], self.test_size[1] / img.shape[1])
         img_info["ratio"] = ratio
+        img, _ = self.preproc(img, None, self.test_size)
         img = torch.from_numpy(img).unsqueeze(0)
         if self.device == "gpu":
             img = img.cuda()
     file_name = os.path.join(exp.output_dir, args.experiment_name)
     os.makedirs(file_name, exist_ok=True)
+    vis_folder = None
     if args.save_result:
         vis_folder = os.path.join(file_name, "vis_res")
         os.makedirs(vis_folder, exist_ok=True)
         trt_file = None
         decoder = None
+    predictor = Predictor(model, exp, COCO_CLASSES, trt_file, decoder, args.device, args.legacy)
     current_time = time.localtime()
     if args.demo == "image":
         image_demo(predictor, vis_folder, args.path, current_time, args.save_result)

tools/eval.py CHANGED Viewed

@@ -75,6 +75,13 @@ def make_parser():
         action="store_true",
         help="Using TensorRT model for testing.",
     )
     parser.add_argument(
         "--test",
         dest="test",
@@ -135,7 +142,7 @@ def main(exp, args, num_gpu):
     logger.info("Model Summary: {}".format(get_model_info(model, exp.test_size)))
     logger.info("Model Structure:\n{}".format(str(model)))
-    evaluator = exp.get_evaluator(args.batch_size, is_distributed, args.test)
     torch.cuda.set_device(rank)
     model.cuda(rank)

         action="store_true",
         help="Using TensorRT model for testing.",
     )
+    parser.add_argument(
+        "--legacy",
+        dest="legacy",
+        default=False,
+        action="store_true",
+        help="To be compatible with older versions",
+    )
     parser.add_argument(
         "--test",
         dest="test",
     logger.info("Model Summary: {}".format(get_model_info(model, exp.test_size)))
     logger.info("Model Structure:\n{}".format(str(model)))
+    evaluator = exp.get_evaluator(args.batch_size, is_distributed, args.test, args.legacy)
     torch.cuda.set_device(rank)
     model.cuda(rank)

tools/train.py CHANGED Viewed

@@ -12,7 +12,7 @@ import torch.backends.cudnn as cudnn
 from yolox.core import Trainer, launch
 from yolox.exp import get_exp
-from yolox.utils import configure_nccl, configure_omp
 def make_parser():
@@ -65,6 +65,13 @@ def make_parser():
         action="store_true",
         help="Adopting mix precision training.",
     )
     parser.add_argument(
         "-o",
         "--occupy",
@@ -111,8 +118,8 @@ if __name__ == "__main__":
     if not args.experiment_name:
         args.experiment_name = exp.exp_name
-    num_gpu = torch.cuda.device_count() if args.devices is None else args.devices
-    assert num_gpu <= torch.cuda.device_count()
     dist_url = "auto" if args.dist_url is None else args.dist_url
     launch(

 from yolox.core import Trainer, launch
 from yolox.exp import get_exp
+from yolox.utils import configure_nccl, configure_omp, get_num_devices
 def make_parser():
         action="store_true",
         help="Adopting mix precision training.",
     )
+    parser.add_argument(
+        "--cache",
+        dest="cache",
+        default=False,
+        action="store_true",
+        help="Caching imgs to RAM for fast training.",
+    )
     parser.add_argument(
         "-o",
         "--occupy",
     if not args.experiment_name:
         args.experiment_name = exp.exp_name
+    num_gpu = get_num_devices() if args.devices is None else args.devices
+    assert num_gpu <= get_num_devices()
     dist_url = "auto" if args.dist_url is None else args.dist_url
     launch(

yolox/core/launch.py CHANGED Viewed

@@ -5,6 +5,7 @@
 # Copyright (c) Facebook, Inc. and its affiliates.
 # Copyright (c) Megvii, Inc. and its affiliates.
 from datetime import timedelta
 from loguru import logger
@@ -61,18 +62,37 @@ def launch(
         # TODO prctl in spawned processes
         if dist_url == "auto":
-            assert num_machines == 1, "dist_url=auto cannot work with distributed training."
             port = _find_free_port()
             dist_url = f"tcp://127.0.0.1:{port}"
-        mp.spawn(
             _distributed_worker,
             nprocs=num_gpus_per_machine,
             args=(
-                main_func, world_size, num_gpus_per_machine,
-                machine_rank, backend, dist_url, args
             ),
             daemon=False,
         )
     else:
         main_func(*args)
@@ -89,7 +109,9 @@ def _distributed_worker(
     args,
     timeout=DEFAULT_TIMEOUT,
 ):
-    assert torch.cuda.is_available(), "cuda is not available. Please check your installation."
     global_rank = machine_rank * num_gpus_per_machine + local_rank
     logger.info("Rank {} initialization finished.".format(global_rank))
     try:
@@ -108,7 +130,9 @@ def _distributed_worker(
     assert comm._LOCAL_PROCESS_GROUP is None
     num_machines = world_size // num_gpus_per_machine
     for i in range(num_machines):
-        ranks_on_i = list(range(i * num_gpus_per_machine, (i + 1) * num_gpus_per_machine))
         pg = dist.new_group(ranks_on_i)
         if i == machine_rank:
             comm._LOCAL_PROCESS_GROUP = pg

 # Copyright (c) Facebook, Inc. and its affiliates.
 # Copyright (c) Megvii, Inc. and its affiliates.
+import sys
 from datetime import timedelta
 from loguru import logger
         # TODO prctl in spawned processes
         if dist_url == "auto":
+            assert (
+                num_machines == 1
+            ), "dist_url=auto cannot work with distributed training."
             port = _find_free_port()
             dist_url = f"tcp://127.0.0.1:{port}"
+        start_method = "spawn"
+        cache = vars(args[1]).get("cache", False)
+        # To use numpy memmap for caching image into RAM, we have to use fork method
+        if cache:
+            assert sys.platform != "win32", (
+                "As Windows platform doesn't support fork method, "
+                "do not add --cache in your training command."
+            )
+            start_method = "fork"
+        mp.start_processes(
             _distributed_worker,
             nprocs=num_gpus_per_machine,
             args=(
+                main_func,
+                world_size,
+                num_gpus_per_machine,
+                machine_rank,
+                backend,
+                dist_url,
+                args,
             ),
             daemon=False,
+            start_method=start_method,
         )
     else:
         main_func(*args)
     args,
     timeout=DEFAULT_TIMEOUT,
 ):
+    assert (
+        torch.cuda.is_available()
+    ), "cuda is not available. Please check your installation."
     global_rank = machine_rank * num_gpus_per_machine + local_rank
     logger.info("Rank {} initialization finished.".format(global_rank))
     try:
     assert comm._LOCAL_PROCESS_GROUP is None
     num_machines = world_size // num_gpus_per_machine
     for i in range(num_machines):
+        ranks_on_i = list(
+            range(i * num_gpus_per_machine, (i + 1) * num_gpus_per_machine)
+        )
         pg = dist.new_group(ranks_on_i)
         if i == machine_rank:
             comm._LOCAL_PROCESS_GROUP = pg

yolox/core/trainer.py CHANGED Viewed

@@ -7,9 +7,8 @@ import os
 import time
 from loguru import logger
-import apex
 import torch
-from apex import amp
 from torch.utils.tensorboard import SummaryWriter
 from yolox.data import DataPrefetcher
@@ -41,6 +40,7 @@ class Trainer:
         # training related attr
         self.max_epoch = exp.max_epoch
         self.amp_training = args.fp16
         self.is_distributed = get_world_size() > 1
         self.rank = get_rank()
         self.local_rank = get_local_rank()
@@ -94,18 +94,18 @@ class Trainer:
         inps = inps.to(self.data_type)
         targets = targets.to(self.data_type)
         targets.requires_grad = False
         data_end_time = time.time()
-        outputs = self.model(inps, targets)
         loss = outputs["total_loss"]
         self.optimizer.zero_grad()
-        if self.amp_training:
-            with amp.scale_loss(loss, self.optimizer) as scaled_loss:
-                scaled_loss.backward()
-        else:
-            loss.backward()
-        self.optimizer.step()
         if self.use_model_ema:
             self.ema_model.update(self.model)
@@ -137,9 +137,6 @@ class Trainer:
         # solver related init
         self.optimizer = self.exp.get_optimizer(self.args.batch_size)
-        if self.amp_training:
-            model, optimizer = amp.initialize(model, self.optimizer, opt_level="O1")
         # value of epoch will be set in `resume_train`
         model = self.resume_train(model)
@@ -149,6 +146,7 @@ class Trainer:
             batch_size=self.args.batch_size,
             is_distributed=self.is_distributed,
             no_aug=self.no_aug,
         )
         logger.info("init prefetcher, this might take one minute or less...")
         self.prefetcher = DataPrefetcher(self.train_loader)
@@ -162,9 +160,7 @@ class Trainer:
             occupy_mem(self.local_rank)
         if self.is_distributed:
-            model = apex.parallel.DistributedDataParallel(model)
-            # from torch.nn.parallel import DistributedDataParallel as DDP
-            # model = DDP(model, device_ids=[self.local_rank], broadcast_buffers=False)
         if self.use_model_ema:
             self.ema_model = ModelEMA(model, 0.9998)
@@ -274,8 +270,6 @@ class Trainer:
             model.load_state_dict(ckpt["model"])
             self.optimizer.load_state_dict(ckpt["optimizer"])
             # resume the training states variables
-            if self.amp_training and "amp" in ckpt:
-                amp.load_state_dict(ckpt["amp"])
             start_epoch = (
                 self.args.start_epoch - 1
                 if self.args.start_epoch is not None
@@ -327,10 +321,6 @@ class Trainer:
                 "model": save_model.state_dict(),
                 "optimizer": self.optimizer.state_dict(),
             }
-            if self.amp_training:
-                # save amp state according to
-                # https://nvidia.github.io/apex/amp.html#checkpointing
-                ckpt_state["amp"] = amp.state_dict()
             save_checkpoint(
                 ckpt_state,
                 update_best_ckpt,

 import time
 from loguru import logger
 import torch
+from torch.nn.parallel import DistributedDataParallel as DDP
 from torch.utils.tensorboard import SummaryWriter
 from yolox.data import DataPrefetcher
         # training related attr
         self.max_epoch = exp.max_epoch
         self.amp_training = args.fp16
+        self.scaler = torch.cuda.amp.GradScaler(enabled=args.fp16)
         self.is_distributed = get_world_size() > 1
         self.rank = get_rank()
         self.local_rank = get_local_rank()
         inps = inps.to(self.data_type)
         targets = targets.to(self.data_type)
         targets.requires_grad = False
+        inps, targets = self.exp.preprocess(inps, targets, self.input_size)
         data_end_time = time.time()
+        with torch.cuda.amp.autocast(enabled=self.amp_training):
+            outputs = self.model(inps, targets)
         loss = outputs["total_loss"]
         self.optimizer.zero_grad()
+        self.scaler.scale(loss).backward()
+        self.scaler.step(self.optimizer)
+        self.scaler.update()
         if self.use_model_ema:
             self.ema_model.update(self.model)
         # solver related init
         self.optimizer = self.exp.get_optimizer(self.args.batch_size)
         # value of epoch will be set in `resume_train`
         model = self.resume_train(model)
             batch_size=self.args.batch_size,
             is_distributed=self.is_distributed,
             no_aug=self.no_aug,
+            cache_img=self.args.cache,
         )
         logger.info("init prefetcher, this might take one minute or less...")
         self.prefetcher = DataPrefetcher(self.train_loader)
             occupy_mem(self.local_rank)
         if self.is_distributed:
+            model = DDP(model, device_ids=[self.local_rank], broadcast_buffers=False)
         if self.use_model_ema:
             self.ema_model = ModelEMA(model, 0.9998)
             model.load_state_dict(ckpt["model"])
             self.optimizer.load_state_dict(ckpt["optimizer"])
             # resume the training states variables
             start_epoch = (
                 self.args.start_epoch - 1
                 if self.args.start_epoch is not None
                 "model": save_model.state_dict(),
                 "optimizer": self.optimizer.state_dict(),
             }
             save_checkpoint(
                 ckpt_state,
                 update_best_ckpt,

yolox/data/__init__.py CHANGED Viewed

@@ -4,6 +4,6 @@
 from .data_augment import TrainTransform, ValTransform
 from .data_prefetcher import DataPrefetcher
-from .dataloading import DataLoader, get_yolox_datadir
 from .datasets import *
 from .samplers import InfiniteSampler, YoloBatchSampler

 from .data_augment import TrainTransform, ValTransform
 from .data_prefetcher import DataPrefetcher
+from .dataloading import DataLoader, get_yolox_datadir, worker_init_reset_seed
 from .datasets import *
 from .samplers import InfiniteSampler, YoloBatchSampler

yolox/data/data_augment.py CHANGED Viewed

@@ -140,36 +140,6 @@ def random_perspective(
     return img, targets
-def _distort(image):
-    def _convert(image, alpha=1, beta=0):
-        tmp = image.astype(float) * alpha + beta
-        tmp[tmp < 0] = 0
-        tmp[tmp > 255] = 255
-        image[:] = tmp
-    image = image.copy()
-    if random.randrange(2):
-        _convert(image, beta=random.uniform(-32, 32))
-    if random.randrange(2):
-        _convert(image, alpha=random.uniform(0.5, 1.5))
-    image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
-    if random.randrange(2):
-        tmp = image[:, :, 0].astype(int) + random.randint(-18, 18)
-        tmp %= 180
-        image[:, :, 0] = tmp
-    if random.randrange(2):
-        _convert(image[:, :, 1], alpha=random.uniform(0.5, 1.5))
-    image = cv2.cvtColor(image, cv2.COLOR_HSV2BGR)
-    return image
 def _mirror(image, boxes):
     _, width, _ = image.shape
     if random.randrange(2):
@@ -179,36 +149,27 @@ def _mirror(image, boxes):
     return image, boxes
-def preproc(image, input_size, mean, std, swap=(2, 0, 1)):
-    if len(image.shape) == 3:
-        padded_img = np.ones((input_size[0], input_size[1], 3)) * 114.0
     else:
-        padded_img = np.ones(input_size) * 114.0
-    img = np.array(image)
     r = min(input_size[0] / img.shape[0], input_size[1] / img.shape[1])
     resized_img = cv2.resize(
         img,
         (int(img.shape[1] * r), int(img.shape[0] * r)),
         interpolation=cv2.INTER_LINEAR,
-    ).astype(np.float32)
     padded_img[: int(img.shape[0] * r), : int(img.shape[1] * r)] = resized_img
-    padded_img = padded_img[:, :, ::-1]
-    padded_img /= 255.0
-    if mean is not None:
-        padded_img -= mean
-    if std is not None:
-        padded_img /= std
     padded_img = padded_img.transpose(swap)
     padded_img = np.ascontiguousarray(padded_img, dtype=np.float32)
     return padded_img, r
 class TrainTransform:
-    def __init__(self, p=0.5, rgb_means=None, std=None, max_labels=50):
-        self.means = rgb_means
-        self.std = std
-        self.p = p
         self.max_labels = max_labels
     def __call__(self, image, targets, input_dim):
@@ -216,8 +177,7 @@ class TrainTransform:
         labels = targets[:, 4].copy()
         if len(boxes) == 0:
             targets = np.zeros((self.max_labels, 5), dtype=np.float32)
-            image, r_o = preproc(image, input_dim, self.means, self.std)
-            image = np.ascontiguousarray(image, dtype=np.float32)
             return image, targets
         image_o = image.copy()
@@ -228,10 +188,10 @@ class TrainTransform:
         # bbox_o: [xyxy] to [c_x,c_y,w,h]
         boxes_o = xyxy2cxcywh(boxes_o)
-        image_t = _distort(image)
-        image_t, boxes = _mirror(image_t, boxes)
         height, width, _ = image_t.shape
-        image_t, r_ = preproc(image_t, input_dim, self.means, self.std)
         # boxes [xyxy] 2 [cx,cy,w,h]
         boxes = xyxy2cxcywh(boxes)
         boxes *= r_
@@ -241,7 +201,7 @@ class TrainTransform:
         labels_t = labels[mask_b]
         if len(boxes_t) == 0:
-            image_t, r_o = preproc(image_o, input_dim, self.means, self.std)
             boxes_o *= r_o
             boxes_t = boxes_o
             labels_t = labels_o
@@ -254,7 +214,6 @@ class TrainTransform:
             : self.max_labels
         ]
         padded_labels = np.ascontiguousarray(padded_labels, dtype=np.float32)
-        image_t = np.ascontiguousarray(image_t, dtype=np.float32)
         return image_t, padded_labels
@@ -276,12 +235,16 @@ class ValTransform:
         data
     """
-    def __init__(self, rgb_means=None, std=None, swap=(2, 0, 1)):
-        self.means = rgb_means
         self.swap = swap
-        self.std = std
     # assume input is cv2 img for now
     def __call__(self, img, res, input_size):
-        img, _ = preproc(img, input_size, self.means, self.std, self.swap)
         return img, np.zeros((1, 5))

     return img, targets
 def _mirror(image, boxes):
     _, width, _ = image.shape
     if random.randrange(2):
     return image, boxes
+def preproc(img, input_size, swap=(2, 0, 1)):
+    if len(img.shape) == 3:
+        padded_img = np.ones((input_size[0], input_size[1], 3), dtype=np.uint8) * 114
     else:
+        padded_img = np.ones(input_size, dtype=np.uint8) * 114
     r = min(input_size[0] / img.shape[0], input_size[1] / img.shape[1])
     resized_img = cv2.resize(
         img,
         (int(img.shape[1] * r), int(img.shape[0] * r)),
         interpolation=cv2.INTER_LINEAR,
+    ).astype(np.uint8)
     padded_img[: int(img.shape[0] * r), : int(img.shape[1] * r)] = resized_img
     padded_img = padded_img.transpose(swap)
     padded_img = np.ascontiguousarray(padded_img, dtype=np.float32)
     return padded_img, r
 class TrainTransform:
+    def __init__(self, max_labels=50):
         self.max_labels = max_labels
     def __call__(self, image, targets, input_dim):
         labels = targets[:, 4].copy()
         if len(boxes) == 0:
             targets = np.zeros((self.max_labels, 5), dtype=np.float32)
+            image, r_o = preproc(image, input_dim)
             return image, targets
         image_o = image.copy()
         # bbox_o: [xyxy] to [c_x,c_y,w,h]
         boxes_o = xyxy2cxcywh(boxes_o)
+        augment_hsv(image)
+        image_t, boxes = _mirror(image, boxes)
         height, width, _ = image_t.shape
+        image_t, r_ = preproc(image_t, input_dim)
         # boxes [xyxy] 2 [cx,cy,w,h]
         boxes = xyxy2cxcywh(boxes)
         boxes *= r_
         labels_t = labels[mask_b]
         if len(boxes_t) == 0:
+            image_t, r_o = preproc(image_o, input_dim)
             boxes_o *= r_o
             boxes_t = boxes_o
             labels_t = labels_o
             : self.max_labels
         ]
         padded_labels = np.ascontiguousarray(padded_labels, dtype=np.float32)
         return image_t, padded_labels
         data
     """
+    def __init__(self, swap=(2, 0, 1), legacy=False):
         self.swap = swap
+        self.legacy = legacy
     # assume input is cv2 img for now
     def __call__(self, img, res, input_size):
+        img, _ = preproc(img, input_size, self.swap)
+        if self.legacy:
+            img = img[::-1, :, :].copy()
+            img /= 255.0
+            img -= np.array([0.485, 0.456, 0.406]).reshape(3, 1, 1)
+            img /= np.array([0.229, 0.224, 0.225]).reshape(3, 1, 1)
         return img, np.zeros((1, 5))

yolox/data/data_prefetcher.py CHANGED Viewed

@@ -2,12 +2,7 @@
 # -*- coding:utf-8 -*-
 # Copyright (c) Megvii, Inc. and its affiliates.
-import random
 import torch
-import torch.distributed as dist
-from yolox.utils import synchronize
 class DataPrefetcher:
@@ -54,24 +49,3 @@ class DataPrefetcher:
     @staticmethod
     def _record_stream_for_image(input):
         input.record_stream(torch.cuda.current_stream())
-def random_resize(data_loader, exp, epoch, rank, is_distributed):
-    tensor = torch.LongTensor(1).cuda()
-    if is_distributed:
-        synchronize()
-    if rank == 0:
-        if epoch > exp.max_epoch - 10:
-            size = exp.input_size
-        else:
-            size = random.randint(*exp.random_size)
-            size = int(32 * size)
-        tensor.fill_(size)
-    if is_distributed:
-        synchronize()
-        dist.broadcast(tensor, 0)
-    input_size = data_loader.change_input_dim(multiple=tensor.item(), random_range=None)
-    return input_size

 # -*- coding:utf-8 -*-
 # Copyright (c) Megvii, Inc. and its affiliates.
 import torch
 class DataPrefetcher:
     @staticmethod
     def _record_stream_for_image(input):
         input.record_stream(torch.cuda.current_stream())

yolox/data/dataloading.py CHANGED Viewed

@@ -4,6 +4,9 @@
 import os
 import random
 import torch
 from torch.utils.data.dataloader import DataLoader as torchDataLoader
@@ -32,41 +35,6 @@ class DataLoader(torchDataLoader):
     See :class:`torch.utils.data.DataLoader` for more information on the arguments.
     Check more on the following website:
     https://gitlab.com/EAVISE/lightnet/-/blob/master/lightnet/data/_dataloading.py
-    Note:
-        This dataloader only works with :class:`lightnet.data.Dataset` based datasets.
-    Example:
-        >>> class CustomSet(ln.data.Dataset):
-        ...     def __len__(self):
-        ...         return 4
-        ...     @ln.data.Dataset.resize_getitem
-        ...     def __getitem__(self, index):
-        ...         # Should return (image, anno) but here we return (input_dim,)
-        ...         return (self.input_dim,)
-        >>> dl = ln.data.DataLoader(
-        ...     CustomSet((200,200)),
-        ...     batch_size = 2,
-        ...     collate_fn = ln.data.list_collate   # We want the data to be grouped as a list
-        ... )
-        >>> dl.dataset.input_dim    # Default input_dim
-        (200, 200)
-        >>> for d in dl:
-        ...     d
-        [[(200, 200), (200, 200)]]
-        [[(200, 200), (200, 200)]]
-        >>> dl.change_input_dim(320, random_range=None)
-        (320, 320)
-        >>> for d in dl:
-        ...     d
-        [[(320, 320), (320, 320)]]
-        [[(320, 320), (320, 320)]]
-        >>> dl.change_input_dim((480, 320), random_range=None)
-        (480, 320)
-        >>> for d in dl:
-        ...     d
-        [[(480, 320), (480, 320)]]
-        [[(480, 320), (480, 320)]]
     """
     def __init__(self, *args, **kwargs):
@@ -120,46 +88,6 @@ class DataLoader(torchDataLoader):
     def close_mosaic(self):
         self.batch_sampler.mosaic = False
-    def change_input_dim(self, multiple=32, random_range=(10, 19)):
-        """This function will compute a new size and update it on the next mini_batch.
-        Args:
-            multiple (int or tuple, optional): values to multiply the randomly generated range by.
-                Default **32**
-            random_range (tuple, optional): This (min, max) tuple sets the range
-                for the randomisation; Default **(10, 19)**
-        Return:
-            tuple: width, height tuple with new dimension
-        Note:
-            The new size is generated as follows: |br|
-            First we compute a random integer inside ``[random_range]``.
-            We then multiply that number with the ``multiple`` argument,
-            which gives our final new input size. |br|
-            If ``multiple`` is an integer we generate a square size. If you give a tuple
-            of **(width, height)**, the size is computed
-            as :math:`rng * multiple[0], rng * multiple[1]`.
-        Note:
-            You can set the ``random_range`` argument to **None** to set
-            an exact size of multiply. |br|
-            See the example above for how this works.
-        """
-        if random_range is None:
-            size = 1
-        else:
-            size = random.randint(*random_range)
-        if isinstance(multiple, int):
-            size = (size * multiple, size * multiple)
-        else:
-            size = (size * multiple[0], size * multiple[1])
-        self.batch_sampler.new_input_dim = size
-        return size
 def list_collate(batch):
     """
@@ -176,3 +104,10 @@ def list_collate(batch):
             items[i] = default_collate(items[i])
     return items

 import os
 import random
+import uuid
+import numpy as np
 import torch
 from torch.utils.data.dataloader import DataLoader as torchDataLoader
     See :class:`torch.utils.data.DataLoader` for more information on the arguments.
     Check more on the following website:
     https://gitlab.com/EAVISE/lightnet/-/blob/master/lightnet/data/_dataloading.py
     """
     def __init__(self, *args, **kwargs):
     def close_mosaic(self):
         self.batch_sampler.mosaic = False
 def list_collate(batch):
     """
             items[i] = default_collate(items[i])
     return items
+def worker_init_reset_seed(worker_id):
+    seed = uuid.uuid4().int % 2**32
+    random.seed(seed)
+    torch.set_rng_state(torch.manual_seed(seed).get_state())
+    np.random.seed(seed)

yolox/data/datasets/coco.py CHANGED Viewed

@@ -3,6 +3,7 @@
 # Copyright (c) Megvii, Inc. and its affiliates.
 import os
 import cv2
 import numpy as np
@@ -24,6 +25,7 @@ class COCODataset(Dataset):
         name="train2017",
         img_size=(416, 416),
         preproc=None,
     ):
         """
         COCO dataset initialization. Annotation data are read into memory by COCO API.
@@ -45,17 +47,70 @@ class COCODataset(Dataset):
         self.class_ids = sorted(self.coco.getCatIds())
         cats = self.coco.loadCats(self.coco.getCatIds())
         self._classes = tuple([c["name"] for c in cats])
-        self.annotations = self._load_coco_annotations()
         self.name = name
         self.img_size = img_size
         self.preproc = preproc
     def __len__(self):
         return len(self.ids)
     def _load_coco_annotations(self):
         return [self.load_anno_from_ids(_ids) for _ids in self.ids]
     def load_anno_from_ids(self, id_):
         im_ann = self.coco.loadImgs(id_)[0]
         width = im_ann["width"]
@@ -81,32 +136,56 @@ class COCODataset(Dataset):
             res[ix, 0:4] = obj["clean_bbox"]
             res[ix, 4] = cls
-        img_info = (height, width)
-        file_name = im_ann["file_name"] if "file_name" in im_ann else "{:012}".format(id_) + ".jpg"
-        del im_ann, annotations
-        return (res, img_info, file_name)
     def load_anno(self, index):
         return self.annotations[index][0]
-    def pull_item(self, index):
-        id_ = self.ids[index]
-        res, img_info, file_name = self.annotations[index]
-        # load image and preprocess
-        img_file = os.path.join(
-            self.data_dir, self.name, file_name
-        )
         img = cv2.imread(img_file)
         assert img is not None
         return img, res.copy(), img_info, np.array([id_])
-    @Dataset.resize_getitem
     def __getitem__(self, index):
         """
         One image / label pair for the given index is picked up and pre-processed.
@@ -122,10 +201,8 @@ class COCODataset(Dataset):
                     class (float): class index.
                     xc, yc (float) : center of bbox whose values range from 0 to 1.
                     w, h (float) : size of bbox whose values range from 0 to 1.
-            info_img : tuple of h, w, nh, nw, dx, dy.
                 h, w (int): original shape of the image
-                nh, nw (int): shape of the resized image without padding
-                dx, dy (int): pad size
             img_id (int): same as the input index. Used for evaluation.
         """
         img, target, img_info, img_id = self.pull_item(index)

 # Copyright (c) Megvii, Inc. and its affiliates.
 import os
+from loguru import logger
 import cv2
 import numpy as np
         name="train2017",
         img_size=(416, 416),
         preproc=None,
+        cache=False,
     ):
         """
         COCO dataset initialization. Annotation data are read into memory by COCO API.
         self.class_ids = sorted(self.coco.getCatIds())
         cats = self.coco.loadCats(self.coco.getCatIds())
         self._classes = tuple([c["name"] for c in cats])
+        self.imgs = None
         self.name = name
         self.img_size = img_size
         self.preproc = preproc
+        self.annotations = self._load_coco_annotations()
+        if cache:
+            self._cache_images()
     def __len__(self):
         return len(self.ids)
+    def __del__(self):
+        del self.imgs
     def _load_coco_annotations(self):
         return [self.load_anno_from_ids(_ids) for _ids in self.ids]
+    def _cache_images(self):
+        logger.warning(
+            "\n********************************************************************************\n"
+            "You are using cached images in RAM to accelerate training.\n"
+            "This requires large system RAM.\n"
+            "Make sure you have 200G+ RAM and 136G available disk space for training COCO.\n"
+            "********************************************************************************\n"
+        )
+        max_h = self.img_size[0]
+        max_w = self.img_size[1]
+        cache_file = self.data_dir + "/img_resized_cache_" + self.name + ".array"
+        if not os.path.exists(cache_file):
+            logger.info(
+                "Caching images for the frist time. This might take about 20 minutes for COCO"
+            )
+            self.imgs = np.memmap(
+                cache_file,
+                shape=(len(self.ids), max_h, max_w, 3),
+                dtype=np.uint8,
+                mode="w+",
+            )
+            from tqdm import tqdm
+            from multiprocessing.pool import ThreadPool
+            NUM_THREADs = min(8, os.cpu_count())
+            loaded_images = ThreadPool(NUM_THREADs).imap(
+                lambda x: self.load_resized_img(x),
+                range(len(self.annotations)),
+            )
+            pbar = tqdm(enumerate(loaded_images), total=len(self.annotations))
+            for k, out in pbar:
+                self.imgs[k][: out.shape[0], : out.shape[1], :] = out.copy()
+            self.imgs.flush()
+            pbar.close()
+        else:
+            logger.warning(
+                "You are using cached imgs! Make sure your dataset is not changed!!"
+            )
+        logger.info("Loading cached imgs...")
+        self.imgs = np.memmap(
+            cache_file,
+            shape=(len(self.ids), max_h, max_w, 3),
+            dtype=np.uint8,
+            mode="r+",
+        )
     def load_anno_from_ids(self, id_):
         im_ann = self.coco.loadImgs(id_)[0]
         width = im_ann["width"]
             res[ix, 0:4] = obj["clean_bbox"]
             res[ix, 4] = cls
+        r = min(self.img_size[0] / height, self.img_size[1] / width)
+        res[:, :4] *= r
+        img_info = (height, width)
+        resized_info = (int(height * r), int(width * r))
+        file_name = (
+            im_ann["file_name"]
+            if "file_name" in im_ann
+            else "{:012}".format(id_) + ".jpg"
+        )
+        return (res, img_info, resized_info, file_name)
     def load_anno(self, index):
         return self.annotations[index][0]
+    def load_resized_img(self, index):
+        img = self.load_image(index)
+        r = min(self.img_size[0] / img.shape[0], self.img_size[1] / img.shape[1])
+        resized_img = cv2.resize(
+            img,
+            (int(img.shape[1] * r), int(img.shape[0] * r)),
+            interpolation=cv2.INTER_LINEAR,
+        ).astype(np.uint8)
+        return resized_img
+    def load_image(self, index):
+        file_name = self.annotations[index][3]
+        img_file = os.path.join(self.data_dir, self.name, file_name)
         img = cv2.imread(img_file)
         assert img is not None
+        return img
+    def pull_item(self, index):
+        id_ = self.ids[index]
+        res, img_info, resized_info, _ = self.annotations[index]
+        if self.imgs is not None:
+            pad_img = self.imgs[index]
+            img = pad_img[: resized_info[0], : resized_info[1], :].copy()
+        else:
+            img = self.load_resized_img(index)
         return img, res.copy(), img_info, np.array([id_])
+    @Dataset.mosaic_getitem
     def __getitem__(self, index):
         """
         One image / label pair for the given index is picked up and pre-processed.
                     class (float): class index.
                     xc, yc (float) : center of bbox whose values range from 0 to 1.
                     w, h (float) : size of bbox whose values range from 0 to 1.
+            info_img : tuple of h, w.
                 h, w (int): original shape of the image
             img_id (int): same as the input index. Used for evaluation.
         """
         img, target, img_info, img_id = self.pull_item(index)

yolox/data/datasets/datasets_wrapper.py CHANGED Viewed

@@ -87,42 +87,28 @@ class Dataset(torchDataset):
         return self.__input_dim
     @staticmethod
-    def resize_getitem(getitem_fn):
         """
         Decorator method that needs to be used around the ``__getitem__`` method. |br|
-        This decorator enables the on the fly resizing of
-        the ``input_dim`` with our :class:`~lightnet.data.DataLoader` class.
         Example:
             >>> class CustomSet(ln.data.Dataset):
             ...     def __len__(self):
             ...         return 10
-            ...     @ln.data.Dataset.resize_getitem
             ...     def __getitem__(self, index):
-            ...         # Should return (image, anno) but here we return input_dim
-            ...         return self.input_dim
-            >>> data = CustomSet((200,200))
-            >>> data[0]
-            (200, 200)
-            >>> data[(480,320), 0]
-            (480, 320)
         """
         @wraps(getitem_fn)
         def wrapper(self, index):
             if not isinstance(index, int):
-                has_dim = True
-                self._input_dim = index[0]
-                self.enable_mosaic = index[2]
                 index = index[1]
-            else:
-                has_dim = False
             ret_val = getitem_fn(self, index)
-            if has_dim:
-                del self._input_dim
             return ret_val
         return wrapper

         return self.__input_dim
     @staticmethod
+    def mosaic_getitem(getitem_fn):
         """
         Decorator method that needs to be used around the ``__getitem__`` method. |br|
+        This decorator enables the closing mosaic
         Example:
             >>> class CustomSet(ln.data.Dataset):
             ...     def __len__(self):
             ...         return 10
+            ...     @ln.data.Dataset.mosaic_getitem
             ...     def __getitem__(self, index):
+            ...         return self.enable_mosaic
         """
         @wraps(getitem_fn)
         def wrapper(self, index):
             if not isinstance(index, int):
+                self.enable_mosaic = index[0]
                 index = index[1]
             ret_val = getitem_fn(self, index)
             return ret_val
         return wrapper

yolox/data/datasets/mosaicdetection.py CHANGED Viewed

@@ -7,7 +7,7 @@ import random
 import cv2
 import numpy as np
-from yolox.utils import adjust_box_anns
 from ..data_augment import box_candidates, random_perspective
 from .datasets_wrapper import Dataset
@@ -40,7 +40,8 @@ class MosaicDetection(Dataset):
     def __init__(
         self, dataset, img_size, mosaic=True, preproc=None,
         degrees=10.0, translate=0.1, scale=(0.5, 1.5), mscale=(0.5, 1.5),
-        shear=2.0, perspective=0.0, enable_mixup=True, *args
     ):
         """
@@ -69,13 +70,16 @@ class MosaicDetection(Dataset):
         self.mixup_scale = mscale
         self.enable_mosaic = mosaic
         self.enable_mixup = enable_mixup
     def __len__(self):
         return len(self._dataset)
-    @Dataset.resize_getitem
     def __getitem__(self, idx):
-        if self.enable_mosaic:
             mosaic_labels = []
             input_dim = self._dataset.input_dim
             input_h, input_w = input_dim[0], input_dim[1]
@@ -137,7 +141,11 @@ class MosaicDetection(Dataset):
             # -----------------------------------------------------------------
             # CopyPaste: https://arxiv.org/abs/2012.07177
             # -----------------------------------------------------------------
-            if self.enable_mixup and not len(mosaic_labels) == 0:
                 mosaic_img, mosaic_labels = self.mixup(mosaic_img, mosaic_labels, self.input_dim)
             mix_img, padded_labels = self.preproc(mosaic_img, mosaic_labels, self.input_dim)
             img_info = (mix_img.shape[1], mix_img.shape[0])
@@ -160,31 +168,35 @@ class MosaicDetection(Dataset):
         img, cp_labels, _, _ = self._dataset.pull_item(cp_index)
         if len(img.shape) == 3:
-            cp_img = np.ones((input_dim[0], input_dim[1], 3)) * 114.0
         else:
-            cp_img = np.ones(input_dim) * 114.0
         cp_scale_ratio = min(input_dim[0] / img.shape[0], input_dim[1] / img.shape[1])
         resized_img = cv2.resize(
             img,
             (int(img.shape[1] * cp_scale_ratio), int(img.shape[0] * cp_scale_ratio)),
             interpolation=cv2.INTER_LINEAR,
-        ).astype(np.float32)
         cp_img[
             : int(img.shape[0] * cp_scale_ratio), : int(img.shape[1] * cp_scale_ratio)
         ] = resized_img
         cp_img = cv2.resize(
             cp_img,
             (int(cp_img.shape[1] * jit_factor), int(cp_img.shape[0] * jit_factor)),
         )
         cp_scale_ratio *= jit_factor
         if FLIP:
             cp_img = cp_img[:, ::-1, :]
         origin_h, origin_w = cp_img.shape[:2]
         target_h, target_w = origin_img.shape[:2]
         padded_img = np.zeros(
-            (max(origin_h, target_h), max(origin_w, target_w), 3)
-        ).astype(np.uint8)
         padded_img[:origin_h, :origin_w] = cp_img
         x_offset, y_offset = 0, 0
@@ -220,4 +232,4 @@ class MosaicDetection(Dataset):
             origin_img = origin_img.astype(np.float32)
             origin_img = 0.5 * origin_img + 0.5 * padded_cropped_img.astype(np.float32)
-        return origin_img, origin_labels

 import cv2
 import numpy as np
+from yolox.utils import adjust_box_anns, get_local_rank
 from ..data_augment import box_candidates, random_perspective
 from .datasets_wrapper import Dataset
     def __init__(
         self, dataset, img_size, mosaic=True, preproc=None,
         degrees=10.0, translate=0.1, scale=(0.5, 1.5), mscale=(0.5, 1.5),
+        shear=2.0, perspective=0.0, enable_mixup=True,
+        mosaic_prob=1.0, mixup_prob=1.0, *args
     ):
         """
         self.mixup_scale = mscale
         self.enable_mosaic = mosaic
         self.enable_mixup = enable_mixup
+        self.mosaic_prob = mosaic_prob
+        self.mixup_prob = mixup_prob
+        self.local_rank = get_local_rank()
     def __len__(self):
         return len(self._dataset)
+    @Dataset.mosaic_getitem
     def __getitem__(self, idx):
+        if self.enable_mosaic and random.random() < self.mosaic_prob:
             mosaic_labels = []
             input_dim = self._dataset.input_dim
             input_h, input_w = input_dim[0], input_dim[1]
             # -----------------------------------------------------------------
             # CopyPaste: https://arxiv.org/abs/2012.07177
             # -----------------------------------------------------------------
+            if (
+                self.enable_mixup
+                and not len(mosaic_labels) == 0
+                and random.random() < self.mixup_prob
+            ):
                 mosaic_img, mosaic_labels = self.mixup(mosaic_img, mosaic_labels, self.input_dim)
             mix_img, padded_labels = self.preproc(mosaic_img, mosaic_labels, self.input_dim)
             img_info = (mix_img.shape[1], mix_img.shape[0])
         img, cp_labels, _, _ = self._dataset.pull_item(cp_index)
         if len(img.shape) == 3:
+            cp_img = np.ones((input_dim[0], input_dim[1], 3), dtype=np.uint8) * 114
         else:
+            cp_img = np.ones(input_dim, dtype=np.uint8) * 114
         cp_scale_ratio = min(input_dim[0] / img.shape[0], input_dim[1] / img.shape[1])
         resized_img = cv2.resize(
             img,
             (int(img.shape[1] * cp_scale_ratio), int(img.shape[0] * cp_scale_ratio)),
             interpolation=cv2.INTER_LINEAR,
+        )
         cp_img[
             : int(img.shape[0] * cp_scale_ratio), : int(img.shape[1] * cp_scale_ratio)
         ] = resized_img
         cp_img = cv2.resize(
             cp_img,
             (int(cp_img.shape[1] * jit_factor), int(cp_img.shape[0] * jit_factor)),
         )
         cp_scale_ratio *= jit_factor
         if FLIP:
             cp_img = cp_img[:, ::-1, :]
         origin_h, origin_w = cp_img.shape[:2]
         target_h, target_w = origin_img.shape[:2]
         padded_img = np.zeros(
+            (max(origin_h, target_h), max(origin_w, target_w), 3), dtype=np.uint8
+        )
         padded_img[:origin_h, :origin_w] = cp_img
         x_offset, y_offset = 0, 0
             origin_img = origin_img.astype(np.float32)
             origin_img = 0.5 * origin_img + 0.5 * padded_cropped_img.astype(np.float32)
+        return origin_img.astype(np.uint8), origin_labels

yolox/data/datasets/voc.py CHANGED Viewed

@@ -10,6 +10,7 @@ import os
 import os.path
 import pickle
 import xml.etree.ElementTree as ET
 import cv2
 import numpy as np
@@ -35,7 +36,9 @@ class AnnotationTransform(object):
     """
     def __init__(self, class_to_ind=None, keep_difficult=True):
-        self.class_to_ind = class_to_ind or dict(zip(VOC_CLASSES, range(len(VOC_CLASSES))))
         self.keep_difficult = keep_difficult
     def __call__(self, target):
@@ -48,7 +51,11 @@ class AnnotationTransform(object):
         """
         res = np.empty((0, 5))
         for obj in target.iter("object"):
-            difficult = int(obj.find("difficult").text) == 1
             if not self.keep_difficult and difficult:
                 continue
             name = obj.find("name").text.strip()
@@ -66,7 +73,11 @@ class AnnotationTransform(object):
             res = np.vstack((res, bndbox))  # [xmin, ymin, xmax, ymax, label_ind]
             # img_id = target.find('filename').text[:-4]
-        return res  # [[xmin, ymin, xmax, ymax, label_ind], ... ]
 class VOCDetection(Dataset):
@@ -91,11 +102,12 @@ class VOCDetection(Dataset):
     def __init__(
         self,
         data_dir,
-        image_sets=[('2007', 'trainval'), ('2012', 'trainval')],
         img_size=(416, 416),
         preproc=None,
         target_transform=AnnotationTransform(),
         dataset_name="VOC0712",
     ):
         super().__init__(img_size)
         self.root = data_dir
@@ -116,16 +128,98 @@ class VOCDetection(Dataset):
             ):
                 self.ids.append((rootpath, line.strip()))
     def __len__(self):
         return len(self.ids)
-    def load_anno(self, index):
         img_id = self.ids[index]
         target = ET.parse(self._annopath % img_id).getroot()
-        if self.target_transform is not None:
-            target = self.target_transform(target)
-        return target
     def pull_item(self, index):
         """Returns the original image and target at an index for mixup
@@ -138,17 +232,17 @@ class VOCDetection(Dataset):
         Return:
             img, target
         """
-        img_id = self.ids[index]
-        img = cv2.imread(self._imgpath % img_id, cv2.IMREAD_COLOR)
-        height, width, _ = img.shape
-        target = self.load_anno(index)
-        img_info = (height, width)
         return img, target, img_info, index
-    @Dataset.resize_getitem
     def __getitem__(self, index):
         img, target, img_info, img_id = self.pull_item(index)
@@ -167,7 +261,9 @@ class VOCDetection(Dataset):
         all_boxes[class][image] = [] or np.array of shape #dets x 5
         """
         self._write_voc_results_file(all_boxes)
-        IouTh = np.linspace(0.5, 0.95, int(np.round((0.95 - 0.5) / 0.05)) + 1, endpoint=True)
         mAPs = []
         for iou in IouTh:
             mAP = self._do_python_eval(output_dir, iou)

 import os.path
 import pickle
 import xml.etree.ElementTree as ET
+from loguru import logger
 import cv2
 import numpy as np
     """
     def __init__(self, class_to_ind=None, keep_difficult=True):
+        self.class_to_ind = class_to_ind or dict(
+            zip(VOC_CLASSES, range(len(VOC_CLASSES)))
+        )
         self.keep_difficult = keep_difficult
     def __call__(self, target):
         """
         res = np.empty((0, 5))
         for obj in target.iter("object"):
+            difficult = obj.find("difficult")
+            if difficult is not None:
+                difficult = int(difficult.text) == 1
+            else:
+                difficult = False
             if not self.keep_difficult and difficult:
                 continue
             name = obj.find("name").text.strip()
             res = np.vstack((res, bndbox))  # [xmin, ymin, xmax, ymax, label_ind]
             # img_id = target.find('filename').text[:-4]
+        width = int(target.find("size").find("width").text)
+        height = int(target.find("size").find("height").text)
+        img_info = (height, width)
+        return res, img_info
 class VOCDetection(Dataset):
     def __init__(
         self,
         data_dir,
+        image_sets=[("2007", "trainval"), ("2012", "trainval")],
         img_size=(416, 416),
         preproc=None,
         target_transform=AnnotationTransform(),
         dataset_name="VOC0712",
+        cache=False,
     ):
         super().__init__(img_size)
         self.root = data_dir
             ):
                 self.ids.append((rootpath, line.strip()))
+        self.annotations = self._load_coco_annotations()
+        self.imgs = None
+        if cache:
+            self._cache_images()
     def __len__(self):
         return len(self.ids)
+    def _load_coco_annotations(self):
+        return [self.load_anno_from_ids(_ids) for _ids in range(len(self.ids))]
+    def _cache_images(self):
+        logger.warning(
+            "\n********************************************************************************\n"
+            "You are using cached images in RAM to accelerate training.\n"
+            "This requires large system RAM.\n"
+            "Make sure you have 60G+ RAM and 19G available disk space for training VOC.\n"
+            "********************************************************************************\n"
+        )
+        max_h = self.img_size[0]
+        max_w = self.img_size[1]
+        cache_file = self.root + "/img_resized_cache_" + self.name + ".array"
+        if not os.path.exists(cache_file):
+            logger.info(
+                "Caching images for the frist time. This might take about 3 minutes for VOC"
+            )
+            self.imgs = np.memmap(
+                cache_file,
+                shape=(len(self.ids), max_h, max_w, 3),
+                dtype=np.uint8,
+                mode="w+",
+            )
+            from tqdm import tqdm
+            from multiprocessing.pool import ThreadPool
+            NUM_THREADs = min(8, os.cpu_count())
+            loaded_images = ThreadPool(NUM_THREADs).imap(
+                lambda x: self.load_resized_img(x),
+                range(len(self.annotations)),
+            )
+            pbar = tqdm(enumerate(loaded_images), total=len(self.annotations))
+            for k, out in pbar:
+                self.imgs[k][: out.shape[0], : out.shape[1], :] = out.copy()
+            self.imgs.flush()
+            pbar.close()
+        else:
+            logger.warning(
+                "You are using cached imgs! Make sure your dataset is not changed!!"
+            )
+        logger.info("Loading cached imgs...")
+        self.imgs = np.memmap(
+            cache_file,
+            shape=(len(self.ids), max_h, max_w, 3),
+            dtype=np.uint8,
+            mode="r+",
+        )
+    def load_anno_from_ids(self, index):
         img_id = self.ids[index]
         target = ET.parse(self._annopath % img_id).getroot()
+        assert self.target_transform is not None
+        res, img_info = self.target_transform(target)
+        height, width = img_info
+        r = min(self.img_size[0] / height, self.img_size[1] / width)
+        res[:, :4] *= r
+        resized_info = (int(height * r), int(width * r))
+        return (res, img_info, resized_info)
+    def load_anno(self, index):
+        return self.annotations[index][0]
+    def load_resized_img(self, index):
+        img = self.load_image(index)
+        r = min(self.img_size[0] / img.shape[0], self.img_size[1] / img.shape[1])
+        resized_img = cv2.resize(
+            img,
+            (int(img.shape[1] * r), int(img.shape[0] * r)),
+            interpolation=cv2.INTER_LINEAR,
+        ).astype(np.uint8)
+        return resized_img
+    def load_image(self, index):
+        img_id = self.ids[index]
+        img = cv2.imread(self._imgpath % img_id, cv2.IMREAD_COLOR)
+        assert img is not None
+        return img
     def pull_item(self, index):
         """Returns the original image and target at an index for mixup
         Return:
             img, target
         """
+        if self.imgs is not None:
+            target, img_info, resized_info = self.annotations[index]
+            pad_img = self.imgs[index]
+            img = pad_img[: resized_info[0], : resized_info[1], :].copy()
+        else:
+            img = self.load_resized_img(index)
+            target, img_info, _ = self.annotations[index]
         return img, target, img_info, index
+    @Dataset.mosaic_getitem
     def __getitem__(self, index):
         img, target, img_info, img_id = self.pull_item(index)
         all_boxes[class][image] = [] or np.array of shape #dets x 5
         """
         self._write_voc_results_file(all_boxes)
+        IouTh = np.linspace(
+            0.5, 0.95, int(np.round((0.95 - 0.5) / 0.05)) + 1, endpoint=True
+        )
         mAPs = []
         for iou in IouTh:
             mAP = self._do_python_eval(output_dir, iou)

yolox/data/samplers.py CHANGED Viewed

@@ -13,28 +13,18 @@ from torch.utils.data.sampler import Sampler
 class YoloBatchSampler(torchBatchSampler):
     """
-    This batch sampler will generate mini-batches of (dim, index) tuples from another sampler.
     It works just like the :class:`torch.utils.data.sampler.BatchSampler`,
-    but it will prepend a dimension, whilst ensuring it stays the same across one mini-batch.
     """
-    def __init__(self, *args, input_dimension=None, mosaic=True, **kwargs):
         super().__init__(*args, **kwargs)
-        self.input_dim = input_dimension
-        self.new_input_dim = None
         self.mosaic = mosaic
     def __iter__(self):
-        self.__set_input_dim()
         for batch in super().__iter__():
-            yield [(self.input_dim, idx, self.mosaic) for idx in batch]
-            self.__set_input_dim()
-    def __set_input_dim(self):
-        """ This function randomly changes the the input dimension of the dataset. """
-        if self.new_input_dim is not None:
-            self.input_dim = (self.new_input_dim[0], self.new_input_dim[1])
-            self.new_input_dim = None
 class InfiniteSampler(Sampler):

 class YoloBatchSampler(torchBatchSampler):
     """
+    This batch sampler will generate mini-batches of (mosaic, index) tuples from another sampler.
     It works just like the :class:`torch.utils.data.sampler.BatchSampler`,
+    but it will turn on/off the mosaic aug.
     """
+    def __init__(self, *args, mosaic=True, **kwargs):
         super().__init__(*args, **kwargs)
         self.mosaic = mosaic
     def __iter__(self):
         for batch in super().__iter__():
+            yield [(self.mosaic, idx) for idx in batch]
 class InfiniteSampler(Sampler):

yolox/exp/yolox_base.py CHANGED Viewed

@@ -13,7 +13,6 @@ from .base_exp import BaseExp
 class Exp(BaseExp):
     def __init__(self):
         super().__init__()
@@ -32,6 +31,8 @@ class Exp(BaseExp):
         self.val_ann = "instances_val2017.json"
         # --------------- transform config ----------------- #
         self.degrees = 10.0
         self.translate = 0.1
         self.scale = (0.1, 2)
@@ -80,7 +81,9 @@ class Exp(BaseExp):
         self.model.head.initialize_biases(1e-2)
         return self.model
-    def get_data_loader(self, batch_size, is_distributed, no_aug=False):
         from yolox.data import (
             COCODataset,
             TrainTransform,
@@ -88,34 +91,37 @@ class Exp(BaseExp):
             DataLoader,
             InfiniteSampler,
             MosaicDetection,
         )
-        dataset = COCODataset(
-            data_dir=self.data_dir,
-            json_file=self.train_ann,
-            img_size=self.input_size,
-            preproc=TrainTransform(
-                rgb_means=(0.485, 0.456, 0.406),
-                std=(0.229, 0.224, 0.225),
-                max_labels=50,
-            ),
         )
         dataset = MosaicDetection(
             dataset,
             mosaic=not no_aug,
             img_size=self.input_size,
-            preproc=TrainTransform(
-                rgb_means=(0.485, 0.456, 0.406),
-                std=(0.229, 0.224, 0.225),
-                max_labels=120,
-            ),
             degrees=self.degrees,
             translate=self.translate,
             scale=self.scale,
             shear=self.shear,
             perspective=self.perspective,
             enable_mixup=self.enable_mixup,
         )
         self.dataset = dataset
@@ -123,20 +129,22 @@ class Exp(BaseExp):
         if is_distributed:
             batch_size = batch_size // dist.get_world_size()
-        sampler = InfiniteSampler(
-            len(self.dataset), seed=self.seed if self.seed else 0
-        )
         batch_sampler = YoloBatchSampler(
             sampler=sampler,
             batch_size=batch_size,
             drop_last=False,
-            input_dimension=self.input_size,
             mosaic=not no_aug,
         )
         dataloader_kwargs = {"num_workers": self.data_num_workers, "pin_memory": True}
         dataloader_kwargs["batch_sampler"] = batch_sampler
         train_loader = DataLoader(self.dataset, **dataloader_kwargs)
         return train_loader
@@ -145,7 +153,7 @@ class Exp(BaseExp):
         tensor = torch.LongTensor(2).cuda()
         if rank == 0:
-            size_factor = self.input_size[1] * 1. / self.input_size[0]
             size = random.randint(*self.random_size)
             size = (int(32 * size), 32 * int(size * size_factor))
             tensor[0] = size[0]
@@ -155,11 +163,18 @@ class Exp(BaseExp):
             dist.barrier()
             dist.broadcast(tensor, 0)
-        input_size = data_loader.change_input_dim(
-            multiple=(tensor[0].item(), tensor[1].item()), random_range=None
-        )
         return input_size
     def get_optimizer(self, batch_size):
         if "optimizer" not in self.__dict__:
             if self.warmup_epochs > 0:
@@ -190,6 +205,7 @@ class Exp(BaseExp):
     def get_lr_scheduler(self, lr, iters_per_epoch):
         from yolox.utils import LRScheduler
         scheduler = LRScheduler(
             self.scheduler,
             lr,
@@ -202,7 +218,7 @@ class Exp(BaseExp):
         )
         return scheduler
-    def get_eval_loader(self, batch_size, is_distributed, testdev=False):
         from yolox.data import COCODataset, ValTransform
         valdataset = COCODataset(
@@ -210,9 +226,7 @@ class Exp(BaseExp):
             json_file=self.val_ann if not testdev else "image_info_test-dev2017.json",
             name="val2017" if not testdev else "test2017",
             img_size=self.test_size,
-            preproc=ValTransform(
-                rgb_means=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)
-            ),
         )
         if is_distributed:
@@ -233,10 +247,10 @@ class Exp(BaseExp):
         return val_loader
-    def get_evaluator(self, batch_size, is_distributed, testdev=False):
         from yolox.evaluators import COCOEvaluator
-        val_loader = self.get_eval_loader(batch_size, is_distributed, testdev=testdev)
         evaluator = COCOEvaluator(
             dataloader=val_loader,
             img_size=self.test_size,

 class Exp(BaseExp):
     def __init__(self):
         super().__init__()
         self.val_ann = "instances_val2017.json"
         # --------------- transform config ----------------- #
+        self.mosaic_prob = 1.0
+        self.mixup_prob = 1.0
         self.degrees = 10.0
         self.translate = 0.1
         self.scale = (0.1, 2)
         self.model.head.initialize_biases(1e-2)
         return self.model
+    def get_data_loader(
+        self, batch_size, is_distributed, no_aug=False, cache_img=False
+    ):
         from yolox.data import (
             COCODataset,
             TrainTransform,
             DataLoader,
             InfiniteSampler,
             MosaicDetection,
+            worker_init_reset_seed,
         )
+        from yolox.utils import (
+            wait_for_the_master,
+            get_local_rank,
         )
+        local_rank = get_local_rank()
+        with wait_for_the_master(local_rank):
+            dataset = COCODataset(
+                data_dir=self.data_dir,
+                json_file=self.train_ann,
+                img_size=self.input_size,
+                preproc=TrainTransform(max_labels=50),
+                cache=cache_img,
+            )
         dataset = MosaicDetection(
             dataset,
             mosaic=not no_aug,
             img_size=self.input_size,
+            preproc=TrainTransform(max_labels=120),
             degrees=self.degrees,
             translate=self.translate,
             scale=self.scale,
             shear=self.shear,
             perspective=self.perspective,
             enable_mixup=self.enable_mixup,
+            mosaic_prob=self.mosaic_prob,
+            mixup_prob=self.mixup_prob,
         )
         self.dataset = dataset
         if is_distributed:
             batch_size = batch_size // dist.get_world_size()
+        sampler = InfiniteSampler(len(self.dataset), seed=self.seed if self.seed else 0)
         batch_sampler = YoloBatchSampler(
             sampler=sampler,
             batch_size=batch_size,
             drop_last=False,
             mosaic=not no_aug,
         )
         dataloader_kwargs = {"num_workers": self.data_num_workers, "pin_memory": True}
         dataloader_kwargs["batch_sampler"] = batch_sampler
+        # Make sure each process has different random seed, especially for 'fork' method.
+        # Check https://github.com/pytorch/pytorch/issues/63311 for more details.
+        dataloader_kwargs["worker_init_fn"] = worker_init_reset_seed
         train_loader = DataLoader(self.dataset, **dataloader_kwargs)
         return train_loader
         tensor = torch.LongTensor(2).cuda()
         if rank == 0:
+            size_factor = self.input_size[1] * 1.0 / self.input_size[0]
             size = random.randint(*self.random_size)
             size = (int(32 * size), 32 * int(size * size_factor))
             tensor[0] = size[0]
             dist.barrier()
             dist.broadcast(tensor, 0)
+        input_size = (tensor[0].item(), tensor[1].item())
         return input_size
+    def preprocess(self, inputs, targets, tsize):
+        scale = tsize[0] / self.input_size[0]
+        if scale != 1:
+            inputs = nn.functional.interpolate(
+                inputs, size=tsize, mode="bilinear", align_corners=False
+            )
+            targets[..., 1:] = targets[..., 1:] * scale
+        return inputs, targets
     def get_optimizer(self, batch_size):
         if "optimizer" not in self.__dict__:
             if self.warmup_epochs > 0:
     def get_lr_scheduler(self, lr, iters_per_epoch):
         from yolox.utils import LRScheduler
         scheduler = LRScheduler(
             self.scheduler,
             lr,
         )
         return scheduler
+    def get_eval_loader(self, batch_size, is_distributed, testdev=False, legacy=False):
         from yolox.data import COCODataset, ValTransform
         valdataset = COCODataset(
             json_file=self.val_ann if not testdev else "image_info_test-dev2017.json",
             name="val2017" if not testdev else "test2017",
             img_size=self.test_size,
+            preproc=ValTransform(legacy=legacy),
         )
         if is_distributed:
         return val_loader
+    def get_evaluator(self, batch_size, is_distributed, testdev=False, legacy=False):
         from yolox.evaluators import COCOEvaluator
+        val_loader = self.get_eval_loader(batch_size, is_distributed, testdev, legacy)
         evaluator = COCOEvaluator(
             dataloader=val_loader,
             img_size=self.test_size,

yolox/models/yolo_head.py CHANGED Viewed

@@ -486,13 +486,14 @@ class YOLOXHead(nn.Module):
         if mode == "cpu":
             cls_preds_, obj_preds_ = cls_preds_.cpu(), obj_preds_.cpu()
-        cls_preds_ = (
-            cls_preds_.float().unsqueeze(0).repeat(num_gt, 1, 1).sigmoid_()
-            * obj_preds_.unsqueeze(0).repeat(num_gt, 1, 1).sigmoid_()
-        )
-        pair_wise_cls_loss = F.binary_cross_entropy(
-            cls_preds_.sqrt_(), gt_cls_per_image, reduction="none"
-        ).sum(-1)
         del cls_preds_
         cost = (

         if mode == "cpu":
             cls_preds_, obj_preds_ = cls_preds_.cpu(), obj_preds_.cpu()
+        with torch.cuda.amp.autocast(enabled=False):
+            cls_preds_ = (
+                cls_preds_.float().unsqueeze(0).repeat(num_gt, 1, 1).sigmoid_()
+                * obj_preds_.unsqueeze(0).repeat(num_gt, 1, 1).sigmoid_()
+            )
+            pair_wise_cls_loss = F.binary_cross_entropy(
+                cls_preds_.sqrt_(), gt_cls_per_image, reduction="none"
+            ).sum(-1)
         del cls_preds_
         cost = (

yolox/utils/dist.py CHANGED Viewed

@@ -10,9 +10,11 @@ This is useful when doing distributed training.
 """
 import functools
-import logging
 import pickle
 import time
 import numpy as np
@@ -20,6 +22,8 @@ import torch
 from torch import distributed as dist
 __all__ = [
     "is_main_process",
     "synchronize",
     "get_world_size",
@@ -34,6 +38,33 @@ __all__ = [
 _LOCAL_PROCESS_GROUP = None
 def synchronize():
     """
     Helper function to synchronize (barrier) among all processes when using distributed training
@@ -112,7 +143,6 @@ def _serialize_to_tensor(data, group):
     buffer = pickle.dumps(data)
     if len(buffer) > 1024 ** 3:
-        logger = logging.getLogger(__name__)
         logger.warning(
             "Rank {} trying to all-gather {:.2f} GB of data on device {}".format(
                 get_rank(), len(buffer) / (1024 ** 3), device

 """
 import functools
+import os
 import pickle
 import time
+from contextlib import contextmanager
+from loguru import logger
 import numpy as np
 from torch import distributed as dist
 __all__ = [
+    "get_num_devices",
+    "wait_for_the_master",
     "is_main_process",
     "synchronize",
     "get_world_size",
 _LOCAL_PROCESS_GROUP = None
+def get_num_devices():
+    gpu_list = os.getenv('CUDA_VISIBLE_DEVICES', None)
+    if gpu_list is not None:
+        return len(gpu_list.split(','))
+    else:
+        devices_list_info = os.popen("nvidia-smi -L")
+        devices_list_info = devices_list_info.read().strip().split("\n")
+        return len(devices_list_info)
+@contextmanager
+def wait_for_the_master(local_rank: int):
+    """
+    Make all processes waiting for the master to do some task.
+    """
+    if local_rank > 0:
+        dist.barrier()
+    yield
+    if local_rank == 0:
+        if not dist.is_available():
+            return
+        if not dist.is_initialized():
+            return
+        else:
+            dist.barrier()
 def synchronize():
     """
     Helper function to synchronize (barrier) among all processes when using distributed training
     buffer = pickle.dumps(data)
     if len(buffer) > 1024 ** 3:
         logger.warning(
             "Rank {} trying to all-gather {:.2f} GB of data on device {}".format(
                 get_rank(), len(buffer) / (1024 ** 3), device