Skip to Main content Skip to Navigation
Conference papers

Notes on the use of variational autoencoders for speech and audio spectrogram modeling

Laurent Girin 1 Fanny Roche 2 Thomas Hueber 1 Simon Leglaive 3
3 PERCEPTION [2016-2019] - Interpretation and Modelling of Images and Videos [2016-2019]
Inria Grenoble - Rhône-Alpes, LJK [2016-2019] - Laboratoire Jean Kuntzmann [2016-2019], Grenoble INP [2007-2019] - Institut polytechnique de Grenoble - Grenoble Institute of Technology [2007-2019]
Abstract : Variational autoencoders (VAEs) are powerful (deep) generative artificial neural networks. They have been recently used in several papers for speech and audio processing, in particular for the modeling of speech/audio spectrograms. In these papers, very poor theoretical support is given to justify the chosen data representation and decoder likelihood function or the corresponding cost function used for training the VAE. Yet, a nice theoretical statistical framework exists and has been extensively presented and discussed in papers dealing with nonnegative matrix factorization (NMF) of audio spectrograms and its application to audio source separation. In the present paper, we show how this statistical framework applies to VAE-based speech/audio spectrogram modeling. This provides the latter insights on the choice and interpretability of data representation and model parameterization.
Complete list of metadatas

Cited literature [40 references]  Display  Hide  Download
Contributor : Thomas Hueber <>
Submitted on : Tuesday, November 12, 2019 - 3:16:47 PM
Last modification on : Friday, August 7, 2020 - 3:14:07 AM
Long-term archiving on: : Thursday, February 13, 2020 - 5:51:25 PM


Files produced by the author(s)


  • HAL Id : hal-02349385, version 1



Laurent Girin, Fanny Roche, Thomas Hueber, Simon Leglaive. Notes on the use of variational autoencoders for speech and audio spectrogram modeling. DAFx 2019 - 22nd International Conference on Digital Audio Effects, Sep 2019, Birmingham, United Kingdom. pp.1-8. ⟨hal-02349385⟩



Record views


Files downloads