Skip to Main content Skip to Navigation
Conference papers

Text mining with constrained tensor decomposition

Abstract : Text mining, as a special case of data mining, refers to the estimation of knowledge or parameters necessary for certain purposes, such as unsupervised clustering by observing various documents. In this context, the topic of a document can be seen as a hidden variable, and words are multi-view variables related to each other by a topic. The main goal in this paper is to estimate the probability of topics, and conditional probability of words given topics. To this end, we use non negative Canonical Polyadic (CP) decomposition of a third order moment tensor of observed words. Our computer simulations show that the proposed algorithm has better performance compared to a previously proposed algorithm, which utilizes the Robust tensor power method after whitening by second order moment. Moreover, as our cost function includes the non negativity constraint on estimated probabilities, we never obtain negative values in our estimated probabilities , whereas it is often the case with the power method combined with deflation. In addition, our algorithm is capable of handling over-complete cases, where the number of hidden variables is larger than that of multi-view variables, contrary to deflation-based techniques. Further, the method proposed therein supports a larger over-completeness compared to modified versions of the tensor power method, which has been customized to handle over-complete case.
Complete list of metadatas

Cited literature [27 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-02084803
Contributor : Pierre Comon <>
Submitted on : Monday, July 1, 2019 - 5:25:44 PM
Last modification on : Wednesday, May 13, 2020 - 4:30:06 PM

File

SobhCJB19_LOD28.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02084803, version 3

Citation

Elaheh Sobhani, Pierre Comon, Christian Jutten, Massoud Babaie-Zadeh. Text mining with constrained tensor decomposition. Fifth International Conference on Machine Learning, Optimization, and Data Science, Sep 2019, Certosa di Pontignano, Siena, Italy. ⟨hal-02084803⟩

Share

Metrics

Record views

221

Files downloads

178