A variance modeling framework based on variational autoencoders for speech enhancement

Simon Leglaive 1 Laurent Girin 2 Radu Horaud 1
1 PERCEPTION - Interpretation and Modelling of Images and Videos
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, INPG - Institut National Polytechnique de Grenoble
2 GIPSA-CRISSP - CRISSP
GIPSA-DPC - Département Parole et Cognition
Abstract : In this paper we address the problem of enhancing speech signals in noisy mixtures using a source separation approach. We explore the use of neural networks as an alternative to a popular speech variance model based on supervised non-negative matrix factorization (NMF). More precisely, we use a variational autoencoder as a speaker-independent supervised generative speech model, highlighting the conceptual similarities that this approach shares with its NMF-based counterpart. In order to be free of generalization issues regarding the noisy recording environments, we follow the approach of having a supervised model only for the target speech signal, the noise model being based on unsupervised NMF. We develop a Monte Carlo expectation-maximization algorithm for inferring the latent variables in the variational autoencoder and estimating the unsupervised model parameters. Experiments show that the proposed method outperforms a semi-supervised NMF baseline and a state-of-the-art fully supervised deep learning approach.
Type de document :
Communication dans un congrès
MSLP 2018 - IEEE International Workshop on Machine Learning for Signal Processing, Sep 2018, Aalborg, Denmark. IEEE, pp.1-6, 2018
Liste complète des métadonnées

Littérature citée [3 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01832826
Contributeur : Simon Leglaive <>
Soumis le : jeudi 12 juillet 2018 - 11:24:57
Dernière modification le : vendredi 16 novembre 2018 - 16:01:10
Document(s) archivé(s) le : lundi 15 octobre 2018 - 22:45:33

Fichier

LGH_MLSP2018_final.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01832826, version 1

Citation

Simon Leglaive, Laurent Girin, Radu Horaud. A variance modeling framework based on variational autoencoders for speech enhancement. MSLP 2018 - IEEE International Workshop on Machine Learning for Signal Processing, Sep 2018, Aalborg, Denmark. IEEE, pp.1-6, 2018. 〈hal-01832826〉

Partager

Métriques

Consultations de la notice

441

Téléchargements de fichiers

337