Papers
arxiv:2005.06720

Streaming keyword spotting on mobile devices

Published on May 14, 2020
Authors:
,
,
,
,

Abstract

A library for automatic conversion of non-streaming to streaming KWS models on mobile devices demonstrates tradeoffs between latency and accuracy and introduces multi-head attention models reducing classification error.

AI-generated summary

In this work we explore the latency and accuracy of keyword spotting (KWS) models in streaming and non-streaming modes on mobile phones. NN model conversion from non-streaming mode (model receives the whole input sequence and then returns the classification result) to streaming mode (model receives portion of the input sequence and classifies it incrementally) may require manual model rewriting. We address this by designing a Tensorflow/Keras based library which allows automatic conversion of non-streaming models to streaming ones with minimum effort. With this library we benchmark multiple KWS models in both streaming and non-streaming modes on mobile phones and demonstrate different tradeoffs between latency and accuracy. We also explore novel KWS models with multi-head attention which reduce the classification error over the state-of-art by 10% on Google speech commands data sets V2. The streaming library with all experiments is open-sourced.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2005.06720 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2005.06720 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2005.06720 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.