Bags in Bag: Generating Context-Aware Bags for Tracking Emotions from Speech

Abstract : Whereas systems based on deep learning have been proposed to learn efficient representations of emotional speech data, methods such as Bag-of-Audio-Words (BoAW) have yielded similar or even better performance while providing understandable representations of the data. In those representations, however , context information is overlooked as the BoAW include only local information. In this paper, we propose to learn a novel representation 'Bag-of-Context-Aware-Words' that encapsulates the context with neighbouring frames of BoAW; segment-level BoAW are extracted in the first layer which are then utilised to create a final instance-level bag. Such a hierarchical structure of BoAW enables the system to learn representations with context information. To evaluate the effectiveness of the method, we perform extensive experiments on a time-and value-continuous spontaneous emotion database: RECOLA. The results show that, the best segment length for valence is twice of that for arousal, suggesting that the prediction of the emotional valence requires more context information than the prediction of arousal, and the performance obtained on RECOLA with the proposed Bag-of-Context-Aware-Words outperforms all previously reported results.
Document type :
Conference papers
Liste complète des métadonnées
Contributor : Fabien Ringeval <>
Submitted on : Friday, January 25, 2019 - 12:52:45 PM
Last modification on : Wednesday, February 13, 2019 - 1:04:59 AM


Files produced by the author(s)




Jing Han, Zixing Zhang, Maximilian Schmitt, Zhao Ren, Fabien Ringeval, et al.. Bags in Bag: Generating Context-Aware Bags for Tracking Emotions from Speech. Interspeech 2018, Sep 2018, Hyderabad, India. pp.3082-3086, ⟨10.21437/Interspeech.2018-996⟩. ⟨hal-01994202⟩



Record views


Files downloads