Streaming-LDA: A Copula-based Approach to Modeling Topic Dependencies in Document Streams

Abstract : We propose in this paper two new models for modeling topic and word-topic dependencies between consecutive documents in document streams. The first model is a direct extension of Latent Dirichlet Allocation model (LDA) and makes use of a Dirichlet distribution to balance the influence of the LDA prior parameters wrt to topic and word-topic distribution of the previous document. The second extension makes use of copulas, which constitute a generic tools to model dependencies between random variables. We rely here on Archimedean copulas, and more precisely on Franck copulas, as they are symmetric and associative and are thus appropriate for exchangeable random variables. Our experiments , conducted on three standard collections that have been used in several studies on topic modeling, show that our proposals outperform previous ones (as dynamic topic models and temporal LDA), both in terms of perplexity and for tracking similar topics in a document stream.
Type de document :
Communication dans un congrès
22nd ACM SIGKDD Conference Knowledge Discovery and Data Mining, Aug 2016, San Francisco, United States. 2016, <10.1145/2939672.2939781>
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01344779
Contributeur : Hesam Amoualian <>
Soumis le : mardi 12 juillet 2016 - 15:19:19
Dernière modification le : mardi 6 décembre 2016 - 01:02:09

Fichier

StreamingCopulaLDA-KDD.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Hesam Amoualian, Marianne Clausel, Eric Gaussier, Massih-Reza Amini. Streaming-LDA: A Copula-based Approach to Modeling Topic Dependencies in Document Streams. 22nd ACM SIGKDD Conference Knowledge Discovery and Data Mining, Aug 2016, San Francisco, United States. 2016, <10.1145/2939672.2939781>. <hal-01344779>

Partager

Métriques

Consultations de
la notice

200

Téléchargements du document

68