Unified Perceptual Parsing for Scene Understanding
Introduction
[ALGORITHM]
@inproceedings{xiao2018unified,
title={Unified perceptual parsing for scene understanding},
author={Xiao, Tete and Liu, Yingcheng and Zhou, Bolei and Jiang, Yuning and Sun, Jian},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
pages={418--434},
year={2018}
}
Results and models
Cityscapes
Method |
Backbone |
Crop Size |
Lr schd |
Mem (GB) |
Inf time (fps) |
mIoU |
mIoU(ms+flip) |
download |
UPerNet |
R-50 |
512x1024 |
40000 |
6.4 |
4.25 |
77.10 |
78.37 |
model | log |
UPerNet |
R-101 |
512x1024 |
40000 |
7.4 |
3.79 |
78.69 |
80.11 |
model | log |
UPerNet |
R-50 |
769x769 |
40000 |
7.2 |
1.76 |
77.98 |
79.70 |
model | log |
UPerNet |
R-101 |
769x769 |
40000 |
8.4 |
1.56 |
79.03 |
80.77 |
model | log |
UPerNet |
R-50 |
512x1024 |
80000 |
- |
- |
78.19 |
79.19 |
model | log |
UPerNet |
R-101 |
512x1024 |
80000 |
- |
- |
79.40 |
80.46 |
model | log |
UPerNet |
R-50 |
769x769 |
80000 |
- |
- |
79.39 |
80.92 |
model | log |
UPerNet |
R-101 |
769x769 |
80000 |
- |
- |
80.10 |
81.49 |
model | log |
ADE20K
Method |
Backbone |
Crop Size |
Lr schd |
Mem (GB) |
Inf time (fps) |
mIoU |
mIoU(ms+flip) |
download |
UPerNet |
R-50 |
512x512 |
80000 |
8.1 |
23.40 |
40.70 |
41.81 |
model | log |
UPerNet |
R-101 |
512x512 |
80000 |
9.1 |
20.34 |
42.91 |
43.96 |
model | log |
UPerNet |
R-50 |
512x512 |
160000 |
- |
- |
42.05 |
42.78 |
model | log |
UPerNet |
R-101 |
512x512 |
160000 |
- |
- |
43.82 |
44.85 |
model | log |
Pascal VOC 2012 + Aug
Method |
Backbone |
Crop Size |
Lr schd |
Mem (GB) |
Inf time (fps) |
mIoU |
mIoU(ms+flip) |
download |
UPerNet |
R-50 |
512x512 |
20000 |
6.4 |
23.17 |
74.82 |
76.35 |
model | log |
UPerNet |
R-101 |
512x512 |
20000 |
7.5 |
19.98 |
77.10 |
78.29 |
model | log |
UPerNet |
R-50 |
512x512 |
40000 |
- |
- |
75.92 |
77.44 |
model | log |
UPerNet |
R-101 |
512x512 |
40000 |
- |
- |
77.43 |
78.56 |
model | log |