Latent Topic Model Based Representations for a Robust Theme Identification of Highly Imperfect Automatic Transcriptions - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2015

Latent Topic Model Based Representations for a Robust Theme Identification of Highly Imperfect Automatic Transcriptions

Mohamed Morchid
Richard Dufour
Georges Linares
Youssef Hamadi
  • Fonction : Auteur
  • PersonId : 840368

Résumé

Speech analytics suffer from poor automatic transcription quality. To tackle this difficulty, a solution consists in mapping transcriptions into a space of hidden topics. This abstract representation allows to work around drawbacks of the ASR process. The well-known and commonly used one is the topic-based representation from a Latent Dirichlet Allocation (LDA). During the LDA learning process, distribution of words into each topic is estimated automatically. Nonetheless, in the context of a classification task, LDA model does not take into account the targeted classes. The supervised Latent Dirichlet Allocation (sLDA) model overcomes this weakness by considering the class, as a response, as well as the document content itself. In this paper, we propose to compare these two classical topic-based representations of a dialogue (LDA and sLDA), with a new one based not only on the dialogue content itself (words), but also on the theme related to the dialogue. This original Author-topic Latent Variables (ATLV) representation is based on the Author-topic (AT) model. The effectiveness of the proposed ATLV representation is evaluated on a classification task from automatic dialogue transcriptions of the Paris Transportation customer service call. Experiments confirmed that this ATLV approach outperforms by far the LDA and sLDA approaches, with a substantial gain of respectively 7.3 and 5.8 points in terms of correctly labeled conversations.
Fichier non déposé

Dates et versions

hal-01293908 , version 1 (25-03-2016)

Identifiants

Citer

Mohamed Morchid, Richard Dufour, Georges Linares, Youssef Hamadi. Latent Topic Model Based Representations for a Robust Theme Identification of Highly Imperfect Automatic Transcriptions. 16th International Conference, CICLing 2015, Apr 2015, Le Caire, Egypt. ⟨10.1007/978-3-319-18117-2_44⟩. ⟨hal-01293908⟩

Collections

UNIV-AVIGNON LIA
60 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More