Bags in Bag: Generating Context-Aware Bags for Tracking Emotions from Speech

Abstract : Whereas systems based on deep learning have been proposed to learn efficient representations of emotional speech data, methods such as Bag-of-Audio-Words (BoAW) have yielded similar or even better performance while providing understandable representations of the data. In those representations, however , context information is overlooked as the BoAW include only local information. In this paper, we propose to learn a novel representation 'Bag-of-Context-Aware-Words' that encapsulates the context with neighbouring frames of BoAW; segment-level BoAW are extracted in the first layer which are then utilised to create a final instance-level bag. Such a hierarchical structure of BoAW enables the system to learn representations with context information. To evaluate the effectiveness of the method, we perform extensive experiments on a time-and value-continuous spontaneous emotion database: RECOLA. The results show that, the best segment length for valence is twice of that for arousal, suggesting that the prediction of the emotional valence requires more context information than the prediction of arousal, and the performance obtained on RECOLA with the proposed Bag-of-Context-Aware-Words outperforms all previously reported results.
Document type :
Conference papers
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01994202
Contributor : Fabien Ringeval <>
Submitted on : Friday, January 25, 2019 - 12:52:45 PM
Last modification on : Wednesday, February 13, 2019 - 1:04:59 AM

File

Han18-BIB.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Jing Han, Zixing Zhang, Maximilian Schmitt, Zhao Ren, Fabien Ringeval, et al.. Bags in Bag: Generating Context-Aware Bags for Tracking Emotions from Speech. Interspeech 2018, Sep 2018, Hyderabad, India. pp.3082-3086, ⟨10.21437/Interspeech.2018-996⟩. ⟨hal-01994202⟩

Share

Metrics

Record views

14

Files downloads

29