Papers
arxiv:2406.15723

Acoustic Feature Mixup for Balanced Multi-aspect Pronunciation Assessment

Published on Jun 22, 2024
Authors:
,
,

Abstract

In automated pronunciation assessment, recent emphasis progressively lies on evaluating multiple aspects to provide enriched feedback. However, acquiring multi-aspect-score labeled data for non-native language learners' speech poses challenges; moreover, it often leads to score-imbalanced distributions. In this paper, we propose two Acoustic Feature Mixup strategies, linearly and non-<PRE_TAG>linearly interpolating</POST_TAG> with the in-batch averaged feature, to address data scarcity and score-label imbalances. Primarily using goodness-of-pronunciation as an acoustic feature, we tailor mixup designs to suit pronunciation assessment. Further, we integrate fine-grained error-rate features by comparing speech recognition results with the original answer phonemes, giving direct hints for mispronunciation. Effective mixing of the acoustic features notably enhances overall scoring performances on the speechocean762 dataset, and detailed analysis highlights our potential to predict unseen distortions.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2406.15723 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2406.15723 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2406.15723 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.