Multichannel Speech Enhancement Based on Time-frequency Masking Using Subband Long Short-Term Memory

Xiaofei Li; Radu Horaud

doi:10.1109/WASPAA.2019.8937218

Communication Dans Un Congrès Année : 2019

Multichannel Speech Enhancement Based on Time-frequency Masking Using Subband Long Short-Term Memory

(1) , (1)

Xiaofei Li

Fonction : Auteur

Interpretation and Modelling of Images and Videos

Radu Horaud

Fonction : Auteur
PersonId : 16183
IdHAL : radu-horaud
ORCID : 0000-0001-5232-024X
IdRef : 032302495

Interpretation and Modelling of Images and Videos

Résumé

We propose a multichannel speech enhancement method using along short-term memory (LSTM) recurrent neural network. The proposed method is developed in the short time Fourier transform (STFT) domain. An LSTM network common to all frequency bands is trained, which processes each frequency band individually by mapping the multichannel noisy STFT coefficient sequence to its corresponding STFT magnitude ratio mask sequence of one reference channel. This subband LSTM network exploits the differences between temporal/spatial characteristics of speech and noise, namely speech source is non-stationary and coherent, while noise is stationary and less spatially-correlated. Experiments with different types of noise show that the proposed method outperforms the baseline deep-learning-based full-band method and unsupervised method. In addition, since it does not learn the wideband spectral structure of either speech or noise, the proposed subband LSTM network generalizes very well to unseen speakers and noise types.

Mots clés

denoising LSTM Speech enhancement Speech denoising Time-frequency masking subband LSTM

Domaines

Traitement du signal et de l'image [eess.SP] Apprentissage [cs.LG] Son [cs.SD]

Fichier principal

Xiaofei_WASPAA2019.pdf (239.65 Ko)

WASPAA2019_Presentation.pdf (1.69 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Perception team : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-02264247

Soumis le : lundi 14 octobre 2019-17:55:01

Dernière modification le : samedi 27 avril 2024-03:09:38

Dates et versions

hal-02264247 , version 1 (06-08-2019)

hal-02264247 , version 2 (14-10-2019)

Identifiants

HAL Id : hal-02264247 , version 2
DOI : 10.1109/WASPAA.2019.8937218

Citer

Xiaofei Li, Radu Horaud. Multichannel Speech Enhancement Based on Time-frequency Masking Using Subband Long Short-Term Memory. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, Oct 2019, New Paltz, NY, United States. pp.298-302, ⟨10.1109/WASPAA.2019.8937218⟩. ⟨hal-02264247v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 UGA CNRS INRIA IRISA INSMI LJK LJK_GI LJK_GI_PERCEPTION INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

1074 Consultations

1251 Téléchargements

Multichannel Speech Enhancement Based on Time-frequency Masking Using Subband Long Short-Term Memory

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager