Exploring Validity Indices for Clustering Textual Data

Ahmad Elsayed; Hakim Hacid; Djamel Abdelkader Zighed

doi:10.1007/978-3-540-88067-7_16

Chapitre D'ouvrage Année : 2009

Exploring Validity Indices for Clustering Textual Data

, , (1)

Ahmad Elsayed

Fonction : Auteur

Hakim Hacid

Fonction : Auteur

Djamel Abdelkader Zighed

Fonction : Auteur
PersonId : 860298

Equipe de Recherche en Ingénierie des Connaissances

Résumé

The goal of any clustering algorithm producing flat partitions of data, is to find both the optimal clustering solution and the optimal number of clusters. One natural way to reach this goal without the need for parameters, is to involve a validity index in a clustering process, which can lead to an objective selection of the optimal number of clusters. In this chapter, we provide two main contributions. Firstly, since validity indices have been mostly studied in a two or three-dimensionnal datasets, we have chosen to evaluate them in a real-world applications, document and word clustering. Secondly, we propose a new context-aware method that aims at enhancing the validity indices usage as stopping criteria in agglomerative algorithms. Experimental results show that the method is a step-forward in using, with more reliability, validity indices as stopping criteria.

Domaines

Apprentissage [cs.LG]

Fabien Rico : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00628587

Soumis le : lundi 3 octobre 2011-16:27:14

Dernière modification le : vendredi 24 février 2023-12:08:52

Dates et versions

hal-00628587 , version 1 (03-10-2011)

Identifiants

HAL Id : hal-00628587 , version 1
DOI : 10.1007/978-3-540-88067-7_16

Citer

Ahmad Elsayed, Hakim Hacid, Djamel Abdelkader Zighed. Exploring Validity Indices for Clustering Textual Data. Djamel Zighed and Shusaku Tsumoto and Zbigniew Ras and Hakim Hacid. Mining Complex Data, Springer, pp.281-300, 2009, Studies in Computational Intelligence vol 165/2009, ⟨10.1007/978-3-540-88067-7_16⟩. ⟨hal-00628587⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-LYON2 ERIC UDL

93 Consultations

0 Téléchargements

Exploring Validity Indices for Clustering Textual Data

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager