Skip to Main content Skip to Navigation
Conference papers

Sélection de mesures de similarité pour les données catégorielles

Guilherme Alves 1 Miguel Couceiro 1 Amedeo Napoli 1
1 ORPAILLEUR - Knowledge representation, reasonning
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Data clustering is a well-known task in data mining and it often relies on distances or, in some cases, similarity measures. The latter is indeed the case for real world datasets that comprise categorical attributes. Several similarity measures have been proposed in the literature, however, their choice depends on the context and the dataset at hand. In this paper, we address the following question: given a set of measures, which one is best suited for clustering a particular dataset? We propose an approach to automate this choice, and we present an empirical study based on categorical datasets, on which we evaluate our proposed approach.
Complete list of metadata

Cited literature [21 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-02410221
Contributor : Guilherme Alves <>
Submitted on : Friday, December 13, 2019 - 5:23:17 PM
Last modification on : Friday, January 29, 2021 - 10:26:02 AM

File

ga-etal-egcf-2020.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02410221, version 1

Collections

Citation

Guilherme Alves, Miguel Couceiro, Amedeo Napoli. Sélection de mesures de similarité pour les données catégorielles. EGC 2020 - 20ème édition de la conférence Extraction et Gestion des Connaissances, Jan 2020, Bruxelles, Belgique. ⟨hal-02410221⟩

Share

Metrics

Record views

126

Files downloads

766