Service interruption on Monday 11 July from 12:30 to 13:00: all the sites of the CCSD (HAL, EpiSciences, SciencesConf, AureHAL) will be inaccessible (network hardware connection).
Skip to Main content Skip to Navigation
Conference papers

Sélection de mesures de similarité pour les données catégorielles

Guilherme Alves 1 Miguel Couceiro 1 Amedeo Napoli 1 
1 ORPAILLEUR - Knowledge representation, reasonning
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Data clustering is a well-known task in data mining and it often relies on distances or, in some cases, similarity measures. The latter is indeed the case for real world datasets that comprise categorical attributes. Several similarity measures have been proposed in the literature, however, their choice depends on the context and the dataset at hand. In this paper, we address the following question: given a set of measures, which one is best suited for clustering a particular dataset? We propose an approach to automate this choice, and we present an empirical study based on categorical datasets, on which we evaluate our proposed approach.
Complete list of metadata

Cited literature [21 references]  Display  Hide  Download
Contributor : Guilherme Alves Connect in order to contact the contributor
Submitted on : Friday, December 13, 2019 - 5:23:17 PM
Last modification on : Thursday, February 3, 2022 - 3:54:08 PM


Files produced by the author(s)


  • HAL Id : hal-02410221, version 1


Guilherme Alves, Miguel Couceiro, Amedeo Napoli. Sélection de mesures de similarité pour les données catégorielles. EGC 2020 - 20ème édition de la conférence Extraction et Gestion des Connaissances, Jan 2020, Bruxelles, Belgique. ⟨hal-02410221⟩



Record views


Files downloads