Skip to Main content Skip to Navigation
Journal articles

Fusion methods for speech enhancement and audio source separation

Abstract : A wide variety of audio source separation techniques exist and can already tackle many challenging industrial issues. However, in contrast with other application domains, fusion principles were rarely investigated in audio source separation despite their demonstrated potential in classification tasks. In this paper, we propose a general fusion framework which takes advantage of the diversity of existing separation techniques in order to improve separation quality. We obtain new source estimates by summing the individual estimates given by different separation techniques weighted by a set of fusion coefficients. We investigate three alternative fusion methods which are based on standard non-linear optimization, Bayesian model averaging or deep neural networks. Experiments conducted for both speech enhancement and singing voice extraction demonstrate that all the proposed methods outperform traditional model selection. The use of deep neural networks for the estimation of time-varying coefficients notably leads to large quality improvements, up to 3 dB in terms of signal-to-distortion ratio (SDR) compared to model selection.
Complete list of metadatas

Cited literature [55 references]  Display  Hide  Download
Contributor : Xabier Jaureguiberry <>
Submitted on : Saturday, April 9, 2016 - 11:39:20 AM
Last modification on : Monday, May 18, 2020 - 8:24:01 PM
Document(s) archivé(s) le : Tuesday, November 15, 2016 - 12:14:14 AM


Files produced by the author(s)



Xabier Jaureguiberry, Emmanuel Vincent, Gael Richard. Fusion methods for speech enhancement and audio source separation. IEEE Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2016, ⟨10.1109/TASLP.2016.2553441⟩. ⟨hal-01120685v4⟩



Record views


Files downloads