A Recurrent Variational Autoencoder for Speech Enhancement

This paper presents a generative approach to speech enhancement based on a recurrent variational autoencoder (RVAE). The deep generative speech model is trained using clean speech signals only, and it is combined with a nonnegative matrix factorization noise model for speech enhancement. We propose a variational expectation-maximization algorithm where the encoder of the RVAE is fine-tuned at test time, to approximate the distribution of the latent variables given the noisy speech observations. Compared with previous approaches based on feed-forward fully-connected architectures, the proposed recurrent deep generative speech model induces a posterior temporal dynamic over the latent variables, which is shown to improve the speech enhancement results.

Domaines

Son [cs.SD] Traitement du signal et de l'image [eess.SP] Réseau de neurones [cs.NE] Intelligence artificielle [cs.AI]

Fichier principal

LAGH_2019.pdf (391.16 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Simon Leglaive : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02329000

Soumis le : mercredi 23 octobre 2019-13:52:42

Dernière modification le : vendredi 21 juillet 2023-13:12:03

Archivage à long terme le : vendredi 24 janvier 2020-17:58:30

Dates et versions

hal-02329000 , version 1 (23-10-2019)

hal-02329000 , version 2 (07-02-2020)

Identifiants

HAL Id : hal-02329000 , version 1
ARXIV : 1910.10942

Citer

Simon Leglaive, Xavier Alameda-Pineda, Laurent Girin, Radu Horaud. A Recurrent Variational Autoencoder for Speech Enhancement. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2020, Barcelona, Spain. ⟨hal-02329000v1⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

454 Consultations

1238 Téléchargements