An I-vector Based Approach to Compact Multi-Granularity Topic Spaces Representation of Textual Documents

Abstract : Various studies highlighted that topic-based approaches give a powerful spoken content representation of documents. Nonetheless, these documents may contain more than one main theme, and their automatic transcription inevitably contains errors. In this study, we propose an original and promising framework based on a compact representation of a textual document , to solve issues related to topic space granularity. Firstly, various topic spaces are estimated with different numbers of classes from a Latent Dirichlet Allocation. Then, this multiple topic space representation is compacted into an elementary segment , called c-vector, originally developed in the context of speaker recognition. Experiments are conducted on the DECODA corpus of conversations. Results show the effectiveness of the proposed multi-view compact representation paradigm. Our identification system reaches an accuracy of 85%, with a significant gain of 9 points compared to the baseline (best single topic space configuration).
Document type :
Conference papers
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01318651
Contributor : Bibliothèque Universitaire Déposants Hal-Avignon <>
Submitted on : Thursday, May 19, 2016 - 5:22:55 PM
Last modification on : Tuesday, July 2, 2019 - 5:38:02 PM

Links full text

Identifiers

Collections

Citation

Mohamed Morchid, Mohamed Bouallegue, Richard Dufour, Georges Linarès, Driss Matrouf, et al.. An I-vector Based Approach to Compact Multi-Granularity Topic Spaces Representation of Textual Documents. the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP),, Oct 2014, Doha, Qatar. ⟨10.3115/v1/D14-1051⟩. ⟨hal-01318651⟩

Share

Metrics

Record views

77