Facing Realism in Spontaneous Emotion Recognition from Speech: Feature Enhancement by Autoencoder with LSTM Neural Networks

Abstract : During the last decade, speech emotion recognition technology has matured well enough to be used in some real-life scenarios. However, these scenarios require an almost silent environment to not compromise the performance of the system. Emotion recognition technology from speech thus needs to evolve and face more challenging conditions, such as environmental additive and convolutional noises, in order to broaden its applicability to real-life conditions. This contribution evaluates the impact of a front-end feature enhancement method based on an autoencoder with long short-term memory neural networks, for robust emotion recognition from speech. Support Vector Regression is then used as a back-end for time- and value-continuous emotion prediction from enhanced features. We perform extensive evaluations on both non-stationary additive noise and convolutional noise, on a database of spontaneous and natural emotions. Results show that the proposed method significantly outperforms a system trained on raw features, for both arousal and valence dimensions, while having almost no degradation when applied to clean speech.
Liste complète des métadonnées

Cited literature [37 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01494003
Contributor : Fabien Ringeval <>
Submitted on : Wednesday, March 22, 2017 - 2:59:03 PM
Last modification on : Tuesday, February 12, 2019 - 1:31:21 AM
Document(s) archivé(s) le : Friday, June 23, 2017 - 1:12:43 PM

File

Zhang16-FRI.pdf
Files produced by the author(s)

Licence


Public Domain

Identifiers

Collections

Citation

Zixing Zhang, Fabien Ringeval, Jing Han, Jun Deng, Erik Marchi, et al.. Facing Realism in Spontaneous Emotion Recognition from Speech: Feature Enhancement by Autoencoder with LSTM Neural Networks. Proceedings INTERSPEECH 2016, 17th Annual Conference of the International Speech Communication Association (ISCA), Sep 2016, San Francisco, CA, United States. pp.3593-3597, ⟨10.21437/Interspeech.2016-998⟩. ⟨hal-01494003⟩

Share

Metrics

Record views

882

Files downloads

549