Audio-noise Power Spectral Density Estimation Using Long Short-term Memory

Xiaofei Li; Simon Leglaive; Laurent Girin; Radu Horaud

doi:10.1109/LSP.2019.2911879

Article Dans Une Revue IEEE Signal Processing Letters Année : 2019

Audio-noise Power Spectral Density Estimation Using Long Short-term Memory

(1) , (1) , (2, 1) , (1)

1
2

Xiaofei Li

Fonction : Auteur

Interpretation and Modelling of Images and Videos

Simon Leglaive

Fonction : Auteur
PersonId : 20853
IdHAL : simon-leglaive
ORCID : 0000-0002-8219-1298
IdRef : 25312171X

Interpretation and Modelling of Images and Videos

Laurent Girin

Fonction : Auteur
PersonId : 3682
IdHAL : laurent-girin
ORCID : 0000-0002-9214-8760
IdRef : 088998037

GIPSA - Cognitive Robotics, Interactive Systems, & Speech Processing

Interpretation and Modelling of Images and Videos

Radu Horaud

Fonction : Auteur
PersonId : 16183
IdHAL : radu-horaud
ORCID : 0000-0001-5232-024X
IdRef : 032302495

Interpretation and Modelling of Images and Videos

Résumé

We propose a method using a long short-term memory (LSTM) network to estimate the noise power spectral density (PSD) of single-channel audio signals represented in the short time Fourier transform (STFT) domain. An LSTM network common to all frequency bands is trained, which processes each frequency band individually by mapping the noisy STFT magnitude sequence to its corresponding noise PSD sequence. Unlike deep-learning-based speech enhancement methods that learn the full-band spectral structure of speech segments, the proposed method exploits the sub-band STFT magnitude evolution of noise with a long time dependency, in the spirit of the unsupervised noise estimators described in the literature. Speaker-and speech-independent experiments with different types of noise show that the proposed method outperforms the unsupervised estimators, and generalizes well to noise types that are not present in the training set.

Mots clés

LSTM Noise PSD Speech enhancement

Domaines

Traitement du signal et de l'image [eess.SP] Apprentissage [cs.LG] Son [cs.SD]

Fichier principal

noise_psd.pdf (279.3 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Perception team : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-02100059

Soumis le : lundi 15 avril 2019-15:05:45

Dernière modification le : jeudi 4 avril 2024-21:30:26

Dates et versions

hal-02100059 , version 1 (15-04-2019)

Identifiants

HAL Id : hal-02100059 , version 1
ARXIV : 1904.05166
DOI : 10.1109/LSP.2019.2911879

Citer

Xiaofei Li, Simon Leglaive, Laurent Girin, Radu Horaud. Audio-noise Power Spectral Density Estimation Using Long Short-term Memory. IEEE Signal Processing Letters, 2019, 26 (6), pp.918-922. ⟨10.1109/LSP.2019.2911879⟩. ⟨hal-02100059⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 UGA CNRS INRIA IRISA GIPSA GIPSA-DPC LJK LJK_GI LJK_GI_PERCEPTION GIPSA-CRISSP INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

194 Consultations

923 Téléchargements

Audio-noise Power Spectral Density Estimation Using Long Short-term Memory

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager