arxiv:2303.08308

SpaceEvo: Hardware-Friendly Search Space Design for Efficient INT8 Inference

Published on Mar 15, 2023

Upvote

Authors:

Jiahang Xu ,

Quanlu Zhang ,

Ting Cao ,

Abstract

The combination of Neural Architecture Search (NAS) and <PRE_TAG>quantization</POST_TAG> has proven successful in automatically designing low-FLOPs INT8 quantized neural networks (QNN). However, directly applying NAS to design accurate QNN models that achieve low latency on real-world devices leads to inferior performance. In this work, we find that the poor INT8 latency is due to the <PRE_TAG><PRE_TAG>quantization</POST_TAG>-unfriendly</POST_TAG> issue: the operator and configuration (e.g., channel width) choices in prior art search spaces lead to diverse <PRE_TAG>quantization</POST_TAG> efficiency and can slow down the INT8 inference <PRE_TAG>speed</POST_TAG>. To address this challenge, we propose SpaceEvo, an automatic method for designing a dedicated, <PRE_TAG><PRE_TAG>quantization</POST_TAG>-friendly search space</POST_TAG> for each target hardware. The key idea of SpaceEvo is to automatically search hardware-preferred <PRE_TAG>operators</POST_TAG> and configurations to construct the search space, guided by a metric called Q-T score to quantify how <PRE_TAG>quantization</POST_TAG>-friendly a candidate search space is. We further train a quantized-for-all supernet over our discovered search space, enabling the searched models to be directly deployed without extra retraining or <PRE_TAG>quantization</POST_TAG>. Our discovered models establish new SOTA INT8 quantized accuracy under various <PRE_TAG>latency constraints</POST_TAG>, achieving up to 10.1% accuracy improvement on ImageNet than prior art CNNs under the same latency. Extensive experiments on diverse edge devices demonstrate that SpaceEvo consistently outperforms existing manually-designed search spaces with up to 2.5x faster speed while achieving the same accuracy.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2303.08308 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2303.08308 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2303.08308 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.