Total Variability Space for LDA-based multi-view text categorization - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue IEEE/ACM Transactions on Audio, Speech and Language Processing Année : 2015

Total Variability Space for LDA-based multi-view text categorization

Résumé

Mapping text document into LDA-based topic-space is a classical way to extract high level representation of text documents. Unfortunatly , LDA is higly sensitive to hyper-parameters related to class number or word and topic distribution , and there is not any systematic way to prior estimate optimal configurations. Morover , various hyperparameter configurations offer complementary views on the document. In this paper , we propose a method based on a two-step process that , first , expands representation space by using a set of topic spaces and , second , compacts representation space by removing poorly relevant dimensions. These two steps are based respectivelly on multi-view LDA-based representation spaces and factor-analysis models. This model provides a view-independant representation of documents while extracting complementary information from a massive multi-view representation. Experiments are conducted on the DECODA conversation corpus and Reuters-21578 textual dataset. Results show the effectiveness of the proposed multi-view compact representation paradigm. The proposed categorization system reaches an accuracy of 86. 9% and 86. 5% respectively with manual and automatic transcriptions of conversations , and a macro-F1 of 80% during a classification task of the well-known studied Reuters-21578 corpus , with a significant gain compared to the baseline (best single topic space configuration) , as well as methods and document representations previously studied .
Fichier principal
Vignette du fichier
morchid_Decoda_Ivectors.pdf (685.49 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01322940 , version 1 (20-12-2018)

Identifiants

Citer

Mohamed Morchid, Mohamed Bouallegue, Richard Dufour, Georges Linarès, Renato de Mori. Total Variability Space for LDA-based multi-view text categorization. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2015, ⟨10.1109/TASLP.2015.2431854⟩. ⟨hal-01322940⟩
59 Consultations
130 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More