On a Topic Model for Sentences

Abstract : Probabilistic topic models are generative models that describe the content of documents by discovering the latent topics underlying them. However, the structure of the textual input, and for instance the grouping of words in coherent text spans such as sentences, contains much information which is generally lost with these models. In this paper, we propose sentenceLDA, an extension of LDA whose goal is to overcome this limitation by incorporating the structure of the text in the generative and inference processes. We illustrate the advantages of sentenceLDA by comparing it with LDA using both intrinsic (perplexity) and extrinsic (text classification) evaluation tasks on different text collections.
Type de document :
Communication dans un congrès
SIGIR'16 ACM SIGIR conference on Research and Development in Information Retrieval, Jul 2016, Pise, Italy. Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, 〈http://dl.acm.org/citation.cfm?id=2911451.2914714〉
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01391903
Contributeur : Massih-Reza Amini <>
Soumis le : vendredi 4 novembre 2016 - 00:34:04
Dernière modification le : vendredi 24 novembre 2017 - 13:31:17

Identifiants

  • HAL Id : hal-01391903, version 1

Collections

Citation

Georgios Balikas, Massih-Reza Amini, Marianne Clausel. On a Topic Model for Sentences. SIGIR'16 ACM SIGIR conference on Research and Development in Information Retrieval, Jul 2016, Pise, Italy. Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, 〈http://dl.acm.org/citation.cfm?id=2911451.2914714〉. 〈hal-01391903〉

Partager

Métriques

Consultations de la notice

194