Semi-supervised multichannel speech enhancement with variational autoencoders and non-negative matrix factorization

Simon Leglaive; Laurent Girin; Radu Horaud

doi:10.1109/ICASSP.2019.8683704

Communication Dans Un Congrès Année : 2019

Semi-supervised multichannel speech enhancement with variational autoencoders and non-negative matrix factorization

(1) , (2) , (1)

1
2

Simon Leglaive

Fonction : Auteur
PersonId : 20853
IdHAL : simon-leglaive
ORCID : 0000-0002-8219-1298
IdRef : 25312171X

Interpretation and Modelling of Images and Videos

Laurent Girin

Fonction : Auteur
PersonId : 3682
IdHAL : laurent-girin
ORCID : 0000-0002-9214-8760
IdRef : 088998037

GIPSA - Cognitive Robotics, Interactive Systems, & Speech Processing

Radu Horaud

Fonction : Auteur
PersonId : 16183
IdHAL : radu-horaud
ORCID : 0000-0001-5232-024X
IdRef : 032302495

Interpretation and Modelling of Images and Videos

Résumé

In this paper we address speaker-independent multichannel speech enhancement in unknown noisy environments. Our work is based on a well-established multichannel local Gaussian modeling framework. We propose to use a neural network for modeling the speech spectro-temporal content. The parameters of this supervised model are learned using the framework of variational autoencoders. The noisy recording environment is supposed to be unknown, so the noise spectro-temporal modeling remains unsupervised and is based on non-negative matrix factorization (NMF). We develop a Monte Carlo expectation-maximization algorithm and we experimentally show that the proposed approach outperforms its NMF-based counterpart , where speech is modeled using supervised NMF.

Mots clés

Monte Carlo expectation-maximization Variational autoencoders Local Gaussian modeling Multichannel speech enhancement Non-negative matrix factorization

Domaines

Traitement du signal et de l'image [eess.SP] Réseau de neurones [cs.NE]

Fichier principal

LGH-icassp2019.pdf (501.89 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Simon Leglaive : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-02005102

Soumis le : vendredi 8 février 2019-11:20:54

Dernière modification le : mercredi 3 avril 2024-12:50:03

Archivage à long terme le : jeudi 9 mai 2019-14:52:13

Dates et versions

hal-02005102 , version 1 (08-02-2019)

hal-02005102 , version 2 (30-04-2019)

Identifiants

HAL Id : hal-02005102 , version 1
ARXIV : 1811.06713
DOI : 10.1109/ICASSP.2019.8683704

Citer

Simon Leglaive, Laurent Girin, Radu Horaud. Semi-supervised multichannel speech enhancement with variational autoencoders and non-negative matrix factorization. IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP 2019), May 2019, Brighton, United Kingdom. pp.101-105, ⟨10.1109/ICASSP.2019.8683704⟩. ⟨hal-02005102v1⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

272 Consultations

671 Téléchargements

Semi-supervised multichannel speech enhancement with variational autoencoders and non-negative matrix factorization

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Altmetric

Partager