Evaluation of predictive clustering quality

Oumaima Alaoui Ismaili; Vincent Lemaire; Antoine Cornuéjols

Communication Dans Un Congrès Année : 2016

Evaluation of predictive clustering quality

(1, 2) , (2) , (1)

1
2

Oumaima Alaoui Ismaili

Fonction : Auteur

Mathématiques et Informatique Appliquées

Orange Labs

Vincent Lemaire

Fonction : Auteur

Orange Labs

Antoine Cornuéjols

Fonction : Auteur
PersonId : 182386
IdHAL : antoine-cornuejols
ORCID : 0000-0002-2979-3521
IdRef : 067132669

Mathématiques et Informatique Appliquées

Résumé

Predictive clustering [1] is a new supervised learning framework derived from traditional clustering. These algorithms start by identifying pure clusters (in terms of classes) that have a high probability density. Based on the information given by the clusters, these algorithms can predict the class of new instances. Compared to supervised classification, predictive clustering can discover the internal structure of the target class. It thus allows users to find the different reasons behind the same prediction: two heterogeneous instances could have the same predicted label. By its nature, predictive clustering incorporates the characteristics of both supervised classification and clustering. Thus, in the evaluation of predictive clustering results, three points should be taken into account: a high intra-cluster similarity, a low inter-cluster similarity and a good prediction rate. A predictive clustering quality criterion must balance these three points. In this this work, we propose a new criterion for measuring the predictive clustering quality. This criterion calculates the compactness and the separability of clusters using a new supervised similarity measure. This measure exploits the information given by the target class in such way that two instances are considered similar if and only if a distance between them is small and they belong to the same class. And, they are considered heterogeneous if and only if a distance between them is large and they belong to different classes. The obtained results from different simulated datasets show that the proposed criterion constantly gives the optimal number of clusters. To our knowledge, there is no analytic criterion in the state of the art that is able to measure the quality of the results generated by predictive clustering algorithms (the trade-off mentioned above) and therefore to compare with our suggested criterion. So, to compare our results, we use the well know unsupervised criterion (Davies-Bouldin) [2] and two supervised criteria (Adjusted Rand Index [3] and Variation of Information [4]) and we examine if our criterion find the good tradeoff.

Mots clés

clustering quality

Domaines

Sciences du Vivant [q-bio]

Archive Ouverte ProdInra : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01559703

Soumis le : lundi 10 juillet 2017-20:50:00

Dernière modification le : mardi 12 mars 2024-10:44:55

Dates et versions

hal-01559703 , version 1 (10-07-2017)

Identifiants

HAL Id : hal-01559703 , version 1
PRODINRA : 396722

Citer

Oumaima Alaoui Ismaili, Vincent Lemaire, Antoine Cornuéjols. Evaluation of predictive clustering quality. Workshop on Model-Based Clustering and Classification, 2016, Catania, Italy. ⟨hal-01559703⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

AGROPARISTECH INRA MIA-PARIS UNIV-PARIS-SACLAY INRAE GS-COMPUTER-SCIENCE MATHNUM

75 Consultations

0 Téléchargements

Evaluation of predictive clustering quality

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager