arxiv:2211.07302

MedleyVox: An Evaluation Dataset for Multiple Singing Voices Separation

Published on Nov 14, 2022

Authors:

Abstract

Separation of multiple singing voices into each voice is a rarely studied area in music source separation research. The absence of a benchmark dataset has hindered its progress. In this paper, we present an evaluation dataset and provide baseline studies for multiple singing voices separation. First, we introduce <PRE_TAG>MedleyVox</POST_TAG>, an evaluation dataset for multiple singing voices separation. We specify the problem definition in this dataset by categorizing it into i) <PRE_TAG>unison</POST_TAG>, ii) <PRE_TAG>duet</POST_TAG>, iii) main vs. rest, and iv) N-singing separation. Second, to overcome the absence of existing multi-singing datasets for a training purpose, we present a strategy for construction of multiple singing mixtures using various single-singing datasets. Third, we propose the improved super-resolution network (iSRNet), which greatly enhances initial estimates of separation networks. Jointly trained with the Conv-TasNet and the multi-singing mixture construction strategy, the proposed iSRNet achieved comparable performance to ideal time-frequency masks on <PRE_TAG>duet</POST_TAG> and <PRE_TAG>unison</POST_TAG> subsets of <PRE_TAG>MedleyVox</POST_TAG>. Audio samples, the dataset, and codes are available on our website (https://github.com/jeonchangbin49/<PRE_TAG>MedleyVox</POST_TAG>).