Topical Coherence in LDA-based Models through Induced Segmentation

This paper presents an LDA-based model that generates topically coherent segments within documents by jointly segmenting documents and assigning topics to their words. The coherence between topics is ensured through a copula, binding the topics associated to the words of a segment. In addition, this model relies on both document and segment specific topic distributions so as to capture fine grained differences in topic assignments. We show that the proposed model naturally encompasses other state-of-the-art LDA-based models designed for similar tasks. Furthermore, our experiments, conducted on six different publicly available datasets, show the effectiveness of our model in terms of perplexity, Normalized Pointwise Mutual Information, which captures the coherence between the generated topics, and the Micro F1 measure for text classification.

Domaines

Apprentissage [cs.LG] Recherche d'information [cs.IR]

Massih-Reza Amini : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01769776

Soumis le : mercredi 18 avril 2018-13:27:24

Dernière modification le : dimanche 14 avril 2024-03:20:32

Dates et versions

hal-01769776 , version 1 (18-04-2018)

Identifiants

HAL Id : hal-01769776 , version 1
DOI : 10.18653/v1/P17-1165

Citer

Hesam Amoualian, Wei Lu, Gaussier Eric, Georgios Balikas, Massih-Reza Amini, et al.. Topical Coherence in LDA-based Models through Induced Segmentation. 55th Annual Meeting of the Association for Computational Linguistics, Jul 2017, Vancouver, Canada. pp.1799-1809, ⟨10.18653/v1/P17-1165⟩. ⟨hal-01769776⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS INRIA LIG LJK LJK_PS PERSYVAL-LAB LJK-PS-DAO ANR LIG_SIDCH LIG_SIDCH_APTIKAL

120 Consultations

0 Téléchargements