Anger Recognition in Speech Using Acoustic and Linguistic Cues

Tim Polzehl; Alexander Schmitt; Florian Metze; Michael Wagner

doi:10.1016/j.specom.2011.05.002

Article Dans Une Revue Speech Communication Année : 2011

Anger Recognition in Speech Using Acoustic and Linguistic Cues

, (1) , (2) , (3)

1
2
3

Tim Polzehl

Fonction : Auteur correspondant
PersonId : 935703

Connectez-vous pour contacter l'auteur

Alexander Schmitt

Fonction : Auteur

Universität Ulm - Ulm University [Ulm, Allemagne]

Florian Metze

Fonction : Auteur

Carnegie Mellon University [Pittsburgh]

Michael Wagner

Fonction : Auteur

University of Canberra

Résumé

The present study elaborates on the exploitation of both linguistic and acoustic feature modeling for anger classification. In terms of acoustic modeling we generate statistics from acoustic audio descriptors, e.g. pitch, loudness, spectral characteristics. Ranking our features we see that loudness and MFCC seems most promising for all databases. For the English database also pitch features are important. In terms of linguistic modeling we apply probabilistic and entropy-based models of words and phrases, e.g. Bag-of-Words (), Term Frequency (), Term Frequency - Inverse Document Frequency () and the Self-Referential Information (). SRI clearly outperforms vector space models. Modeling phrases slightly improves the scores. After classification of both acoustic and linguistic information on separated levels we fuse information on decision level adding confidences. We compare the obtained scores on three different databases. Two databases are taken from the IVR customer care domain, another database accounts for a WoZ data collection. All corpora are of realistic speech condition. We observe promising results for the IVR databases while the WoZ database shows overall lower scores. In order to provide comparability in between the results we evaluate classification success using the f1 measurement in addition to overall accuracy figures. As a result, acoustic modeling clearly outperforms linguistic modeling. Fusion slightly improves overall scores. With a baseline of approximately 60% accuracy and .40 f1-meaurement by constant majority class voting we obtain an accuracy of 75% with respective .70 f1 for the WoZ database. For the IVR databases we obtain approximately 79% accuracy with respective .78 f1 over a baseline of 60% accurracy with respective .38 f1.

Mots clés

emotion detection anger classification linguistic and prosodic acoustic modeling IGR ranking decision fusion IVR speech

Domaines

Linguistique

Fichier principal

PEER_stage2_10.1016%2Fj.specom.2011.05.002.pdf (323.85 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Hal Peer : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00779289

Soumis le : mardi 22 janvier 2013-03:50:14

Dernière modification le : lundi 4 mars 2019-13:44:09

Archivage à long terme le : samedi 1 avril 2017-08:04:12

Dates et versions

hal-00779289 , version 1 (22-01-2013)

Identifiants

HAL Id : hal-00779289 , version 1
DOI : 10.1016/j.specom.2011.05.002

Citer

Tim Polzehl, Alexander Schmitt, Florian Metze, Michael Wagner. Anger Recognition in Speech Using Acoustic and Linguistic Cues. Speech Communication, 2011, 53 (9-10), pp.1198. ⟨10.1016/j.specom.2011.05.002⟩. ⟨hal-00779289⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

PEER

190 Consultations

587 Téléchargements

Anger Recognition in Speech Using Acoustic and Linguistic Cues

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager