Notes on the use of variational autoencoders for speech and audio spectrogram modeling

Laurent Girin; Fanny Roche; Thomas Hueber; Simon Leglaive

Communication Dans Un Congrès Année : 2019

Notes on the use of variational autoencoders for speech and audio spectrogram modeling

(1) , (2) , (1) , (3)

1
2
3

Laurent Girin

Fonction : Auteur
PersonId : 3682
IdHAL : laurent-girin
ORCID : 0000-0002-9214-8760
IdRef : 088998037

GIPSA - Cognitive Robotics, Interactive Systems, & Speech Processing

Fanny Roche

Fonction : Auteur
PersonId : 1057618

Arturia [Montbonnot-Saint-Martin]

Thomas Hueber

Fonction : Auteur
PersonId : 5965
IdHAL : thomas-hueber
ORCID : 0000-0002-8296-5177
IdRef : 143151568

GIPSA - Cognitive Robotics, Interactive Systems, & Speech Processing

Simon Leglaive

Fonction : Auteur
PersonId : 20853
IdHAL : simon-leglaive
ORCID : 0000-0002-8219-1298
IdRef : 25312171X

Interpretation and Modelling of Images and Videos

Résumé

Variational autoencoders (VAEs) are powerful (deep) generative artificial neural networks. They have been recently used in several papers for speech and audio processing, in particular for the modeling of speech/audio spectrograms. In these papers, very poor theoretical support is given to justify the chosen data representation and decoder likelihood function or the corresponding cost function used for training the VAE. Yet, a nice theoretical statistical framework exists and has been extensively presented and discussed in papers dealing with nonnegative matrix factorization (NMF) of audio spectrograms and its application to audio source separation. In the present paper, we show how this statistical framework applies to VAE-based speech/audio spectrogram modeling. This provides the latter insights on the choice and interpretability of data representation and model parameterization.

Domaines

Traitement du signal et de l'image [eess.SP] Intelligence artificielle [cs.AI]

Fichier principal

Girin_et_al_DAFx2019.pdf (623.18 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Thomas Hueber : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02349385

Soumis le : mardi 12 novembre 2019-15:16:47

Dernière modification le : jeudi 4 avril 2024-18:24:07

Archivage à long terme le : jeudi 13 février 2020-17:51:25

Dates et versions

hal-02349385 , version 1 (12-11-2019)

Identifiants

HAL Id : hal-02349385 , version 1

Citer

Laurent Girin, Fanny Roche, Thomas Hueber, Simon Leglaive. Notes on the use of variational autoencoders for speech and audio spectrogram modeling. DAFx 2019 - 22nd International Conference on Digital Audio Effects, Sep 2019, Birmingham, United Kingdom. pp.1-8. ⟨hal-02349385⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS INRIA GIPSA GIPSA-DPC LJK LJK_GI LJK_GI_PERCEPTION GIPSA-CRISSP INRIA2

413 Consultations

1316 Téléchargements

Notes on the use of variational autoencoders for speech and audio spectrogram modeling

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager