SpaceEvo: Hardware-Friendly Search Space Design for Efficient INT8 Inference
Abstract
The combination of Neural Architecture Search (NAS) and <PRE_TAG>quantization</POST_TAG> has proven successful in automatically designing low-FLOPs INT8 quantized neural networks (QNN). However, directly applying NAS to design accurate QNN models that achieve low latency on real-world devices leads to inferior performance. In this work, we find that the poor INT8 latency is due to the <PRE_TAG><PRE_TAG>quantization</POST_TAG>-unfriendly</POST_TAG> issue: the operator and configuration (e.g., channel width) choices in prior art search spaces lead to diverse <PRE_TAG>quantization</POST_TAG> efficiency and can slow down the INT8 inference <PRE_TAG>speed</POST_TAG>. To address this challenge, we propose SpaceEvo, an automatic method for designing a dedicated, <PRE_TAG><PRE_TAG>quantization</POST_TAG>-friendly search space</POST_TAG> for each target hardware. The key idea of SpaceEvo is to automatically search hardware-preferred <PRE_TAG>operators</POST_TAG> and configurations to construct the search space, guided by a metric called Q-T score to quantify how <PRE_TAG>quantization</POST_TAG>-friendly a candidate search space is. We further train a quantized-for-all supernet over our discovered search space, enabling the searched models to be directly deployed without extra retraining or <PRE_TAG>quantization</POST_TAG>. Our discovered models establish new SOTA INT8 quantized accuracy under various <PRE_TAG>latency constraints</POST_TAG>, achieving up to 10.1% accuracy improvement on ImageNet than prior art CNNs under the same latency. Extensive experiments on diverse edge devices demonstrate that SpaceEvo consistently outperforms existing manually-designed search spaces with up to 2.5x faster speed while achieving the same accuracy.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper