fronx/Fast-FullSubNet · Hugging Face

This is a pre-trained version of Fast FullSubNet, a real-time denoising model trained on the Deep Noise Suppression Challenge dataset of 2020 (DNS-INTERSPEECH-2020).

How to run

https://fullsubnet.readthedocs.io/en/latest/usage/getting_started.html

Code

https://github.com/Audio-WestlakeU/FullSubNet

Note: The code doesn't support real-time streaming out of the box. See issue-67 for details.

Paper

Fast FullSubNet: Accelerate Full-band and Sub-band Fusion Model for Single-channel Speech Enhancement, Xiang Hao, Xiaofei Li

For many speech enhancement applications, a key feature is that system runs on a real-time, latency-sensitive, battery-powered platform, which strictly limits the algorithm latency and computational complexity. In this work, we propose a new architecture named Fast FullSubNet dedicated to accelerating the computation of FullSubNet. Specifically, Fast FullSubNet processes sub-band speech spectra in the mel-frequency domain by using cascaded linear-to-mel full-band, sub-band, and mel-to-linear full-band models such that frequencies involved in the sub-band computation are vastly reduced. After that, a down-sampling operation is proposed for the sub-band input sequence to further reduce the computational complexity along the time axis. Experimental results show that, compared to FullSubNet, Fast FullSubNet has only 13% computational complexity and 16% processing time, and achieves comparable or even better performance.

Performance

	With Reverb				No Reverb
Method	WB-PESQ	NB-PESQ	SI-SDR	STOI	WB-PESQ	NB-PESQ	SI-SDR
Fast FullSubNet (118 Epochs)	2.882	3.42	15.33	0.9233	2.694	3.222	16.34
FullSubNet (58 Epochs) (just for comparison)	2.987	3.496	15.756	0.926	2.889	3.385	17.635