An Intra and Inter-Topic Evaluation and Cleansing Method - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Romanian Journal of Human - Computer Interaction Année : 2010

An Intra and Inter-Topic Evaluation and Cleansing Method

Claudiu Musat
  • Fonction : Auteur correspondant
  • PersonId : 885901

Connectez-vous pour contacter l'auteur
Stefan Trausan-Matu
  • Fonction : Auteur
  • PersonId : 885902

Résumé

Topic modeling is a growing research field and novel ways of interpreting and evaluating results are necessary. We propose a method for evaluating and improving the performance of topic models generating algorithms relying on WordNet data. We first propose a measure for determining a topic model's fitness factoring in its broadness and redundancy. Then, for each individual topic, the amount of relevant information it provides, along with its most important words and related concepts are determined by defining a cohesion function based on the topic's projection on WordNet concepts. The model as a whole is improved by eliminating each topic's outliers with respect to the ontology projection. We define a inter topic ontology based distance and we further use it to investigate the impact of removing redundant topics from a model with regard to the overlap between topics' ontological projections. Clustering similar topics into conceptually cohesive groups is tried as an alternative to pruning less relevant topics. Results show that evaluating and improving statistical models with WordNet is a promising research track that leads to more coherent topic models.
Fichier principal
Vignette du fichier
RRIOC-2010.pdf (278.01 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00552105 , version 1 (27-09-2011)

Identifiants

  • HAL Id : hal-00552105 , version 1

Citer

Claudiu Musat, Marian-Andrei Rizoiu, Stefan Trausan-Matu. An Intra and Inter-Topic Evaluation and Cleansing Method. Romanian Journal of Human - Computer Interaction, 2010, 3 (2), pp.81 - 96. ⟨hal-00552105⟩
143 Consultations
78 Téléchargements

Partager

Gmail Facebook X LinkedIn More