haolongzhangm
commited on
Commit
·
774bd19
1
Parent(s):
dbd24dd
chore(demo): add MegEngine benchmark (#156)
Browse files- demo/MegEngine/cpp/README.md +19 -0
demo/MegEngine/cpp/README.md
CHANGED
@@ -114,6 +114,25 @@ Cpp file compile of YOLOX object detection base on [MegEngine](https://github.co
|
|
114 |
* <use_weight_preprocess> if >=1, will handle weight preprocess before exe
|
115 |
* <run_with_fp16> if >=1, will run with fp16 mode
|
116 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
117 |
## Acknowledgement
|
118 |
|
119 |
* [MegEngine](https://github.com/MegEngine/MegEngine)
|
|
|
114 |
* <use_weight_preprocess> if >=1, will handle weight preprocess before exe
|
115 |
* <run_with_fp16> if >=1, will run with fp16 mode
|
116 |
|
117 |
+
## Bechmark
|
118 |
+
|
119 |
+
* model info: yolox-s @ input(1,3,640,640)
|
120 |
+
|
121 |
+
* test devices
|
122 |
+
|
123 |
+
* x86_64 -- Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
|
124 |
+
* aarch64 -- xiamo phone mi9
|
125 |
+
* cuda -- 1080TI @ cuda-10.1-cudnn-v7.6.3-TensorRT-6.0.1.5.sh @ Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
|
126 |
+
|
127 |
+
| megengine @ tag1.4(fastrun + weight\_preprocess)/sec | 1 thread |
|
128 |
+
| ---------------------------------------------------- | -------- |
|
129 |
+
| x86\_64 | 0.516245 |
|
130 |
+
| aarch64(fp32+chw44) | 0.587857 |
|
131 |
+
|
132 |
+
| CUDA @ 1080TI/sec | 1 batch | 2 batch | 4 batch | 8 batch | 16 batch | 32 batch | 64 batch |
|
133 |
+
| ------------------- | ---------- | --------- | --------- | --------- | --------- | -------- | -------- |
|
134 |
+
| megengine(fp32+chw) | 0.00813703 | 0.0132893 | 0.0236633 | 0.0444699 | 0.0864917 | 0.16895 | 0.334248 |
|
135 |
+
|
136 |
## Acknowledgement
|
137 |
|
138 |
* [MegEngine](https://github.com/MegEngine/MegEngine)
|