Correlation between textual similarity and quality of LDA topic model results - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2019

Correlation between textual similarity and quality of LDA topic model results

Résumé

The LDA topic model describes a corpus on the basis of its vocabulary. Our experiment aims at determining whether LDA outputs' quality can be estimated through text similarity metrics, and if so determining the most relevant one. To do so, we use a categorized corpus on which we apply these metrics on every pair of categories. We present correlation scores between several metrics and the quality of the topic model. The experiments also include a comparison between simple and complex term extraction within our framework. We observed very high correlations with the Hellinger distance with or without complex terms, while the Soergel distance is most efficient when including complex terms. These experiments are a case study on a categorised corpus of 20,000 article abstracts.
Fichier non déposé

Dates et versions

hal-02390357 , version 1 (03-12-2019)

Identifiants

Citer

Amaury Delamaire, Mihaela Juganaru-Mathieu, Michel Beigbeder. Correlation between textual similarity and quality of LDA topic model results. 2019 13th International Conference on Research Challenges in Information Science (RCIS), May 2019, Brussels, Belgium. pp.1-6, ⟨10.1109/RCIS.2019.8877076⟩. ⟨hal-02390357⟩
70 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More