Bags in Bag: Generating Context-Aware Bags for Tracking Emotions from Speech

Jing Han; Zixing Zhang; Maximilian Schmitt; Zhao Ren; Fabien Ringeval; Björn Schuller

doi:10.21437/Interspeech.2018-996

Communication Dans Un Congrès Année : 2018

Bags in Bag: Generating Context-Aware Bags for Tracking Emotions from Speech

(1) , (2) , (1) , (1) , (3) , (1, 2)

1
2
3

Jing Han

Fonction : Auteur
PersonId : 1004513

Universität Augsburg [Augsburg]

Zixing Zhang

Fonction : Auteur

Imperial College London

Maximilian Schmitt

Fonction : Auteur

Universität Augsburg [Augsburg]

Zhao Ren

Fonction : Auteur

Universität Augsburg [Augsburg]

Fabien Ringeval

Fonction : Auteur
PersonId : 13134
IdHAL : fabien-ringeval
ORCID : 0000-0002-9213-4529
IdRef : 154573078

Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole

Björn Schuller

Fonction : Auteur

Universität Augsburg [Augsburg]

Imperial College London

Résumé

Whereas systems based on deep learning have been proposed to learn efficient representations of emotional speech data, methods such as Bag-of-Audio-Words (BoAW) have yielded similar or even better performance while providing understandable representations of the data. In those representations, however , context information is overlooked as the BoAW include only local information. In this paper, we propose to learn a novel representation 'Bag-of-Context-Aware-Words' that encapsulates the context with neighbouring frames of BoAW; segment-level BoAW are extracted in the first layer which are then utilised to create a final instance-level bag. Such a hierarchical structure of BoAW enables the system to learn representations with context information. To evaluate the effectiveness of the method, we perform extensive experiments on a time-and value-continuous spontaneous emotion database: RECOLA. The results show that, the best segment length for valence is twice of that for arousal, suggesting that the prediction of the emotional valence requires more context information than the prediction of arousal, and the performance obtained on RECOLA with the proposed Bag-of-Context-Aware-Words outperforms all previously reported results.

Mots clés

speech analysis emotion recognition bag-of-audio-words context-aware representations

Domaines

Informatique et langage [cs.CL]

Fichier principal

Han18-BIB.pdf (192.22 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Fabien Ringeval : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01994202

Soumis le : vendredi 25 janvier 2019-12:52:45

Dernière modification le : jeudi 4 avril 2024-21:03:40

Archivage à long terme le : vendredi 26 avril 2019-13:16:03

Dates et versions

hal-01994202 , version 1 (25-01-2019)

Identifiants

HAL Id : hal-01994202 , version 1
DOI : 10.21437/Interspeech.2018-996

Citer

Jing Han, Zixing Zhang, Maximilian Schmitt, Zhao Ren, Fabien Ringeval, et al.. Bags in Bag: Generating Context-Aware Bags for Tracking Emotions from Speech. Interspeech 2018, Sep 2018, Hyderabad, India. pp.3082-3086, ⟨10.21437/Interspeech.2018-996⟩. ⟨hal-01994202⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS LIG LIG_TDCGE_GETALP LIG_SIDCH

113 Consultations

208 Téléchargements

Bags in Bag: Generating Context-Aware Bags for Tracking Emotions from Speech

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager