Facing Realism in Spontaneous Emotion Recognition from Speech: Feature Enhancement by Autoencoder with LSTM Neural Networks

Abstract : During the last decade, speech emotion recognition technology has matured well enough to be used in some real-life scenarios. However, these scenarios require an almost silent environment to not compromise the performance of the system. Emotion recognition technology from speech thus needs to evolve and face more challenging conditions, such as environmental additive and convolutional noises, in order to broaden its applicability to real-life conditions. This contribution evaluates the impact of a front-end feature enhancement method based on an autoencoder with long short-term memory neural networks, for robust emotion recognition from speech. Support Vector Regression is then used as a back-end for time- and value-continuous emotion prediction from enhanced features. We perform extensive evaluations on both non-stationary additive noise and convolutional noise, on a database of spontaneous and natural emotions. Results show that the proposed method significantly outperforms a system trained on raw features, for both arousal and valence dimensions, while having almost no degradation when applied to clean speech.
Type de document :
Communication dans un congrès
Proceedings INTERSPEECH 2016, 17th Annual Conference of the International Speech Communication Association (ISCA), Sep 2016, San Francisco, CA, United States. pp.3593-3597, 2016, 〈http://www.interspeech2016.org/〉. 〈10.21437/Interspeech.2016-998〉
Liste complète des métadonnées

Littérature citée [37 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01494003
Contributeur : Fabien Ringeval <>
Soumis le : mercredi 22 mars 2017 - 14:59:03
Dernière modification le : jeudi 11 octobre 2018 - 08:48:03
Document(s) archivé(s) le : vendredi 23 juin 2017 - 13:12:43

Fichier

Zhang16-FRI.pdf
Fichiers produits par l'(les) auteur(s)

Licence


Domaine public

Identifiants

Collections

Citation

Zixing Zhang, Fabien Ringeval, Jing Han, Jun Deng, Erik Marchi, et al.. Facing Realism in Spontaneous Emotion Recognition from Speech: Feature Enhancement by Autoencoder with LSTM Neural Networks. Proceedings INTERSPEECH 2016, 17th Annual Conference of the International Speech Communication Association (ISCA), Sep 2016, San Francisco, CA, United States. pp.3593-3597, 2016, 〈http://www.interspeech2016.org/〉. 〈10.21437/Interspeech.2016-998〉. 〈hal-01494003〉

Partager

Métriques

Consultations de la notice

728

Téléchargements de fichiers

448