Facing Realism in Spontaneous Emotion Recognition from Speech: Feature Enhancement by Autoencoder with LSTM Neural Networks

Zixing Zhang; Fabien Ringeval; Jing Han; Jun Deng; Erik Marchi; Björn Schuller

doi:10.21437/Interspeech.2016-998

Communication Dans Un Congrès Année : 2016

Facing Realism in Spontaneous Emotion Recognition from Speech: Feature Enhancement by Autoencoder with LSTM Neural Networks

(1) , (1, 2) , (1) , (1) , (1) , (1, 3)

1
2
3

Zixing Zhang

Fonction : Auteur

Chair of Complex and Intelligent Systems

Fabien Ringeval

Fonction : Auteur
PersonId : 13134
IdHAL : fabien-ringeval
ORCID : 0000-0002-9213-4529
IdRef : 154573078

Chair of Complex and Intelligent Systems

Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole

Jing Han

Fonction : Auteur

Chair of Complex and Intelligent Systems

Jun Deng

Fonction : Auteur

Chair of Complex and Intelligent Systems

Erik Marchi

Fonction : Auteur

Chair of Complex and Intelligent Systems

Björn Schuller

Fonction : Auteur

Chair of Complex and Intelligent Systems

Department of Computing [London]

Résumé

During the last decade, speech emotion recognition technology has matured well enough to be used in some real-life scenarios. However, these scenarios require an almost silent environment to not compromise the performance of the system. Emotion recognition technology from speech thus needs to evolve and face more challenging conditions, such as environmental additive and convolutional noises, in order to broaden its applicability to real-life conditions. This contribution evaluates the impact of a front-end feature enhancement method based on an autoencoder with long short-term memory neural networks, for robust emotion recognition from speech. Support Vector Regression is then used as a back-end for time- and value-continuous emotion prediction from enhanced features. We perform extensive evaluations on both non-stationary additive noise and convolutional noise, on a database of spontaneous and natural emotions. Results show that the proposed method significantly outperforms a system trained on raw features, for both arousal and valence dimensions, while having almost no degradation when applied to clean speech.

Mots clés

emotion recognition spontaneous speech additive and convolutional noises feature enhancement autoencoder LSTM Neural Networks

Domaines

Réseau de neurones [cs.NE] Son [cs.SD] Recherche d'information [cs.IR]

Fichier principal

Zhang16-FRI.pdf (183.2 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Fabien Ringeval : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01494003

Soumis le : mercredi 22 mars 2017-14:59:03

Dernière modification le : jeudi 4 avril 2024-21:10:11

Archivage à long terme le : vendredi 23 juin 2017-13:12:43

Dates et versions

hal-01494003 , version 1 (22-03-2017)

Licence

Domaine public

Identifiants

HAL Id : hal-01494003 , version 1
DOI : 10.21437/Interspeech.2016-998

Citer

Zixing Zhang, Fabien Ringeval, Jing Han, Jun Deng, Erik Marchi, et al.. Facing Realism in Spontaneous Emotion Recognition from Speech: Feature Enhancement by Autoencoder with LSTM Neural Networks. Proceedings INTERSPEECH 2016, 17th Annual Conference of the International Speech Communication Association (ISCA), Sep 2016, San Francisco, CA, United States. pp.3593-3597, ⟨10.21437/Interspeech.2016-998⟩. ⟨hal-01494003⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS LIG LIG_TDCGE_GETALP LIG_SIDCH

967 Consultations

1013 Téléchargements

Facing Realism in Spontaneous Emotion Recognition from Speech: Feature Enhancement by Autoencoder with LSTM Neural Networks

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager