Two Multimodal Approaches for Single Microphone Source Separation

Abstract : —In this paper, the problem of single microphone source separation via Nonnegative Matrix Factorization (NMF) by exploiting video information is addressed. Respective audio and video modalities coming from a single human speech usually have similar time changes. It means that changes in one of them usually corresponds to changes in the other one. So it is expected that activation coefficient matrices of their NMF decomposition are similar. Based on this similarity, in this paper the activation coefficient matrix of the video modality is used as an initialization for audio source separation via NMF. In addition, the mentioned similarity is used for post-processing and for clustering the rows of the activation coefficient matrix which were resulted from randomly initialized NMF. Simulation results confirm the effectiveness of the proposed multimodal approaches in single microphone source separation.
Document type :
Conference papers
Complete list of metadatas

Cited literature [15 references]  Display  Hide  Download
Contributor : Bertrand Rivet <>
Submitted on : Tuesday, November 22, 2016 - 10:25:45 AM
Last modification on : Monday, July 8, 2019 - 3:08:11 PM
Long-term archiving on : Tuesday, March 21, 2017 - 12:34:22 AM


Files produced by the author(s)


  • HAL Id : hal-01400542, version 1


Farnaz Sedighin, Massoud Babaie-Zadeh, Bertrand Rivet, Christian Jutten. Two Multimodal Approaches for Single Microphone Source Separation. 24th European Signal Processing Conference (EUSIPCO 2016), Aug 2016, Budapest, Hungary. pp.110-114. ⟨hal-01400542⟩



Record views


Files downloads