Multiple topic identification in human/human conversations

X. Bost; G. Senay; M. El-Bèze; R. de Mori

doi:10.1016/j.csl.2015.03.006

Article Dans Une Revue Computer Speech and Language Année : 2015

Multiple topic identification in human/human conversations

(1) , (1) , (1) , (1, 2)

1
2

X. Bost

Fonction : Auteur
PersonId : 170846
IdHAL : xavier-bost
ORCID : 0000-0002-5624-8721
IdRef : 201681404

Laboratoire Informatique d'Avignon

G. Senay

Fonction : Auteur

Laboratoire Informatique d'Avignon

M. El-Bèze

Fonction : Auteur

Laboratoire Informatique d'Avignon

R. de Mori

Fonction : Auteur
PersonId : 981954

Laboratoire Informatique d'Avignon

School of Computer Science [Montréal]

Résumé

The paper deals with the automatic analysis of real-life telephone conversations between an agent and a customer of a customer care service (ccs). The application domain is the public transportation system in Paris and the purpose is to collect statistics about customer problems in order to monitor the service and decide priorities on the intervention for improving user satisfaction. Of primary importance for the analysis is the detection of themes that are the object of customer problems. Themes are defined in the application requirements and are part of the application ontology that is implicit in the ccs documentation. Due to variety of customer population, the structure of conversations with an agent is unpredictable. A conversation may be about one or more themes. Theme mentions can be interleaved with mentions of facts that are irrelevant for the application purpose. Furthermore, in certain conversations theme mentions are localized in specific conversation segments while in other conversations mentions cannot be localized. As a consequence, approaches to feature extraction with and without mention localization are considered. Application domain relevant themes identified by an automatic procedure are expressed by specific sentences whose words are hypothesized by an automatic speech recognition (asr) system. The asr system is error prone. The word error rates can be very high for many reasons. Among them it is worth mentioning unpredictable background noise, speaker accent, and various types of speech disfluencies. As the application task requires the composition of proportions of theme mentions, a sequential decision strategy is introduced in this paper for performing a survey of the large amount of conversations made available in a given time period. The strategy has to sample the conversations to form a survey containing enough data analyzed with high accuracy so that proportions can be estimated with sufficient accuracy. Due to the unpredictable type of theme mentions, it is appropriate to consider methods for theme hypothesization based on global as well as local feature extraction. Two systems based on each type of feature extraction will be considered by the strategy. One of the four methods is novel. It is based on a new definition of density of theme mentions and on the localization of high density zones whose boundaries do not need to be precisely detected. The sequential decision strategy starts by grouping theme hypotheses into sets of different expected accuracy and coverage levels. For those sets for which accuracy can be improved with a consequent increase of coverage a new system with new features is introduced. Its execution is triggered only when specific preconditions are met on the hypotheses generated by the basic four systems. Experimental results are provided on a corpus collected in the call center of the Paris transportation system known as ratp. The results show that surveys with high accuracy and coverage can be composed with the proposed strategy and systems. This makes it possible to apply a previously published proportion estimation approach that takes into account hypothesization errors .

Mots clés

Human/human conversation analysis Interpretation strategies Spoken language understanding Multi-topic identification

Domaines

Informatique et langage [cs.CL]

Fichier principal

journalCSL.pdf (826.68 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Xavier Bost : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01956840

Soumis le : dimanche 23 décembre 2018-14:11:28

Dernière modification le : jeudi 18 juin 2020-12:32:06

Archivage à long terme le : dimanche 24 mars 2019-13:04:57

Dates et versions

hal-01956840 , version 1 (17-12-2018)

hal-01956840 , version 2 (23-12-2018)

Identifiants

HAL Id : hal-01956840 , version 2
ARXIV : 1812.07207
DOI : 10.1016/j.csl.2015.03.006

Citer

X. Bost, G. Senay, M. El-Bèze, R. de Mori. Multiple topic identification in human/human conversations. Computer Speech and Language, 2015, 34 (1), pp.18-42. ⟨10.1016/j.csl.2015.03.006⟩. ⟨hal-01956840v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-AVIGNON LIA

78 Consultations

127 Téléchargements

Multiple topic identification in human/human conversations

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager