Skip to Main content Skip to Navigation
Conference papers

Streaming-LDA: A Copula-based Approach to Modeling Topic Dependencies in Document Streams

Abstract : We propose in this paper two new models for modeling topic and word-topic dependencies between consecutive documents in document streams. The first model is a direct extension of Latent Dirichlet Allocation model (LDA) and makes use of a Dirichlet distribution to balance the influence of the LDA prior parameters wrt to topic and word-topic distribution of the previous document. The second extension makes use of copulas, which constitute a generic tools to model dependencies between random variables. We rely here on Archimedean copulas, and more precisely on Franck copulas, as they are symmetric and associative and are thus appropriate for exchangeable random variables. Our experiments , conducted on three standard collections that have been used in several studies on topic modeling, show that our proposals outperform previous ones (as dynamic topic models and temporal LDA), both in terms of perplexity and for tracking similar topics in a document stream.
Complete list of metadatas

Cited literature [25 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01344779
Contributor : Hesam Amoualian <>
Submitted on : Tuesday, July 12, 2016 - 3:19:19 PM
Last modification on : Monday, April 20, 2020 - 11:24:02 AM

File

StreamingCopulaLDA-KDD.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Hesam Amoualian, Marianne Clausel, Eric Gaussier, Massih-Reza Amini. Streaming-LDA: A Copula-based Approach to Modeling Topic Dependencies in Document Streams. 22nd ACM SIGKDD Conference Knowledge Discovery and Data Mining, Aug 2016, San Francisco, United States. pp.695-704 ⟨10.1145/2939672.2939781⟩. ⟨hal-01344779⟩

Share

Metrics

Record views

713

Files downloads

618