Notes on the use of variational autoencoders for speech and audio spectrogram modeling - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2019

Notes on the use of variational autoencoders for speech and audio spectrogram modeling

Résumé

Variational autoencoders (VAEs) are powerful (deep) generative artificial neural networks. They have been recently used in several papers for speech and audio processing, in particular for the modeling of speech/audio spectrograms. In these papers, very poor theoretical support is given to justify the chosen data representation and decoder likelihood function or the corresponding cost function used for training the VAE. Yet, a nice theoretical statistical framework exists and has been extensively presented and discussed in papers dealing with nonnegative matrix factorization (NMF) of audio spectrograms and its application to audio source separation. In the present paper, we show how this statistical framework applies to VAE-based speech/audio spectrogram modeling. This provides the latter insights on the choice and interpretability of data representation and model parameterization.
Fichier principal
Vignette du fichier
Girin_et_al_DAFx2019.pdf (623.18 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02349385 , version 1 (12-11-2019)

Identifiants

  • HAL Id : hal-02349385 , version 1

Citer

Laurent Girin, Fanny Roche, Thomas Hueber, Simon Leglaive. Notes on the use of variational autoencoders for speech and audio spectrogram modeling. DAFx 2019 - 22nd International Conference on Digital Audio Effects, Sep 2019, Birmingham, United Kingdom. pp.1-8. ⟨hal-02349385⟩
413 Consultations
1316 Téléchargements

Partager

Gmail Facebook X LinkedIn More