Multichannel Speech Separation and Enhancement Using the Convolutive Transfer Function

Xiaofei Li; Laurent Girin; Sharon Gannot; Radu Horaud

doi:10.1109/TASLP.2019.2892412

Article Dans Une Revue IEEE/ACM Transactions on Audio, Speech and Language Processing Année : 2019

Multichannel Speech Separation and Enhancement Using the Convolutive Transfer Function

(1) , (2, 1) , (3) , (1)

1
2
3

Xiaofei Li

Fonction : Auteur

Interpretation and Modelling of Images and Videos

Laurent Girin

Fonction : Auteur
PersonId : 3682
IdHAL : laurent-girin
ORCID : 0000-0002-9214-8760
IdRef : 088998037

GIPSA - Cognitive Robotics, Interactive Systems, & Speech Processing

Interpretation and Modelling of Images and Videos

Sharon Gannot

Fonction : Auteur

Bar-Ilan University [Israël]

Radu Horaud

Fonction : Auteur
PersonId : 16183
IdHAL : radu-horaud
ORCID : 0000-0001-5232-024X
IdRef : 032302495

Interpretation and Modelling of Images and Videos

Résumé

This paper addresses the problem of speech separation and enhancement from multichannel convolutive and noisy mixtures, \emph{assuming known mixing filters}. We propose to perform the speech separation and enhancement task in the short-time Fourier transform domain, using the convolutive transfer function (CTF) approximation. Compared to time-domain filters, CTF has much less taps, consequently it has less near-common zeros among channels and less computational complexity. The work proposes three speech-source recovery methods, namely: i) the multichannel inverse filtering method, i.e. the multiple input/output inverse theorem (MINT), is exploited in the CTF domain, and for the multi-source case, ii) a beamforming-like multichannel inverse filtering method applying single source MINT and using power minimization, which is suitable whenever the source CTFs are not all known, and iii) a constrained Lasso method, where the sources are recovered by minimizing the $\ell_1$-norm to impose their spectral sparsity, with the constraint that the $\ell_2$-norm fitting cost, between the microphone signals and the mixing model involving the unknown source signals, is less than a tolerance. The noise can be reduced by setting a tolerance onto the noise power. Experiments under various acoustic conditions are carried out to evaluate the three proposed methods. The comparison between them as well as with the baseline methods is presented.

Mots clés

Index Terms-Audio source separation speech enhancement short-time Fourier transform convolutive transfer function MINT Lasso optimization

Domaines

Traitement du signal et de l'image [eess.SP] Apprentissage [cs.LG] Son [cs.SD]

Fichier principal

bss_mint.pdf (547.6 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Perception team : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01799809

Soumis le : vendredi 1 mars 2019-17:26:04

Dernière modification le : jeudi 4 avril 2024-21:30:26

Archivage à long terme le : jeudi 30 mai 2019-16:24:04

Dates et versions

hal-01799809 , version 1 (01-03-2019)

Identifiants

HAL Id : hal-01799809 , version 1
ARXIV : 1711.07911
DOI : 10.1109/TASLP.2019.2892412

Citer

Xiaofei Li, Laurent Girin, Sharon Gannot, Radu Horaud. Multichannel Speech Separation and Enhancement Using the Convolutive Transfer Function. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2019, 27 (3), pp.645-659. ⟨10.1109/TASLP.2019.2892412⟩. ⟨hal-01799809⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 UGA CNRS INRIA IRISA GIPSA GIPSA-DPC LJK LJK_GI LJK_GI_PERCEPTION GIPSA-CRISSP INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

281 Consultations

582 Téléchargements

Multichannel Speech Separation and Enhancement Using the Convolutive Transfer Function

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager