Exploring Validity Indices for Clustering Textual Data - Archive ouverte HAL Accéder directement au contenu
Chapitre D'ouvrage Année : 2009

Exploring Validity Indices for Clustering Textual Data

Ahmad Elsayed
  • Fonction : Auteur
Hakim Hacid
  • Fonction : Auteur

Résumé

The goal of any clustering algorithm producing flat partitions of data, is to find both the optimal clustering solution and the optimal number of clusters. One natural way to reach this goal without the need for parameters, is to involve a validity index in a clustering process, which can lead to an objective selection of the optimal number of clusters. In this chapter, we provide two main contributions. Firstly, since validity indices have been mostly studied in a two or three-dimensionnal datasets, we have chosen to evaluate them in a real-world applications, document and word clustering. Secondly, we propose a new context-aware method that aims at enhancing the validity indices usage as stopping criteria in agglomerative algorithms. Experimental results show that the method is a step-forward in using, with more reliability, validity indices as stopping criteria.

Dates et versions

hal-00628587 , version 1 (03-10-2011)

Identifiants

Citer

Ahmad Elsayed, Hakim Hacid, Djamel Abdelkader Zighed. Exploring Validity Indices for Clustering Textual Data. Djamel Zighed and Shusaku Tsumoto and Zbigniew Ras and Hakim Hacid. Mining Complex Data, Springer, pp.281-300, 2009, Studies in Computational Intelligence vol 165/2009, ⟨10.1007/978-3-540-88067-7_16⟩. ⟨hal-00628587⟩
93 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More