Multi-corpus Experiment on Continuous Speech Emotion Recognition: Convolution or Recurrence?

Manon Macary; Martin Lebourdais; Marie Tahon; Yannick Estève; Anthony Rousseau

Communication Dans Un Congrès Année : 2020

Multi-corpus Experiment on Continuous Speech Emotion Recognition: Convolution or Recurrence?

(1, 2) , (1) , (1) , (3) , (2)

1
2
3

Manon Macary

Fonction : Auteur
PersonId : 184326
IdHAL : manon-macary

Laboratoire d'Informatique de l'Université du Mans

Allo-Media

Martin Lebourdais

Fonction : Auteur
PersonId : 1161742
IdHAL : martin-lebourdais
ORCID : 0000-0001-7150-0588

Laboratoire d'Informatique de l'Université du Mans

Marie Tahon

Fonction : Auteur
PersonId : 9821
IdHAL : marie-tahon
ORCID : 0000-0002-6782-0332
IdRef : 165065532

Laboratoire d'Informatique de l'Université du Mans

Yannick Estève

Fonction : Auteur
PersonId : 11645
IdHAL : yannick-esteve
ORCID : 0000-0002-3656-8883
IdRef : 070531668

Laboratoire Informatique d'Avignon

Anthony Rousseau

Fonction : Auteur
PersonId : 1072625

Allo-Media

Résumé

Extraction of semantic information from real-life speech, such as emotions, is a challenging task that has grown in popularity over the last few years. Recently, emotion processing in speech moved from discrete emotional categories to continuous affective dimensions. This trend helps in the design of systems that predict the dynamic evolution of affect in speech. However, no standard annotation guidelines exist for these dimensions thus making cross-corpus studies hard to achieve. Deep neural networks are nowadays predominant in the task of emotion recognition. Almost all systems use recurrent architectures, but convolutional networks were recently reassessed as they are faster to train and have less parameters than recurrent ones. This paper aims at investigating pros and cons of the aforementioned architectures using cross-corpus experiments to highlight the issue of corpus variability. We also explore the best suitable acoustic representation for continuous emotion, together with loss functions. We concluded that recurrent networks are robust to corpus variability and we confirm the power of cepstral features for continuous Speech Emotion Recognition(SER), especially for satisfaction prediction. A final post-treatment applied on prediction brings very nice result (ccc = 0.719) on AlloSat and achieves new state of the art.

Mots clés

Continuous Speech Emotion Recognition Deep Neural Networks Acoustic features

Domaines

Intelligence artificielle [cs.AI] Réseau de neurones [cs.NE]

Fichier principal

SPECOM(1).pdf (424.61 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Marie Tahon : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02945644

Soumis le : lundi 7 décembre 2020-10:06:59

Dernière modification le : lundi 3 juillet 2023-11:13:55

Archivage à long terme le : lundi 8 mars 2021-18:25:37

Dates et versions

hal-02945644 , version 1 (07-12-2020)

Identifiants

HAL Id : hal-02945644 , version 1

Citer

Manon Macary, Martin Lebourdais, Marie Tahon, Yannick Estève, Anthony Rousseau. Multi-corpus Experiment on Continuous Speech Emotion Recognition: Convolution or Recurrence?. 22ND INTERNATIONAL CONFERENCE ON SPEECH AND COMPUTER SPECOM 2020, Oct 2020, St Petersburg, Russia. ⟨hal-02945644⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-AVIGNON UNIV-LEMANS LIUM LIA

204 Consultations

330 Téléchargements

Multi-corpus Experiment on Continuous Speech Emotion Recognition: Convolution or Recurrence?

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager