Discretization of Continuous Attributes

Fabrice Muhlenbach; Ricco Rakotomalala

Chapitre D'ouvrage Année : 2005

Discretization of Continuous Attributes

(1) , (2)

1
2

Fabrice Muhlenbach

Fonction : Auteur
PersonId : 853885

Laboratoire Hubert Curien

Ricco Rakotomalala

Fonction : Auteur
PersonId : 860295

Equipe de Recherche en Ingénierie des Connaissances

Résumé

In the data mining field, many learning methods -like association rules, Bayesian networks, induction rules (Grzymala-Busse & Stefanowski, 2001)- can handle only discrete attributes. Therefore, before the machine learning process, it is necessary to re-encode each continuous attribute in a discrete attribute constituted by a set of intervals, for example the age attribute can be transformed in two discrete values representing two intervals: less than 18 (a minor) and 18 and more (of age). This process, known as discretization, is an essential task of the data preprocessing, not only because some learning methods do not handle continuous attributes, but also for other important reasons: the data transformed in a set of intervals are more cognitively relevant for a human interpretation (Liu, Hussain, Tan & Dash, 2002); the computation process goes faster with a reduced level of data, particularly when some attributes are suppressed from the representation space of the learning problem if it is impossible to find a relevant cut (Mittal & Cheong, 2002); the discretization can provide non-linear relations -e.g., the infants and the elderly people are more sensitive to illness.

Mots clés

data mining data warehousing fouille de donnnées entrepôt de données

Domaines

Apprentissage [cs.LG]

Fichier principal

HAL_Chapter_720_Discretization.pdf (157.3 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Fabrice Muhlenbach : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00383757

Soumis le : mercredi 13 mai 2009-16:36:26

Dernière modification le : vendredi 24 mars 2023-14:52:51

Archivage à long terme le : mercredi 22 septembre 2010-12:25:05

Dates et versions

hal-00383757 , version 1 (13-05-2009)

hal-00383757 , version 2 (13-05-2009)

Identifiants

HAL Id : hal-00383757 , version 2

Citer

Fabrice Muhlenbach, Ricco Rakotomalala. Discretization of Continuous Attributes. John Wang. Encyclopedia of Data Warehousing and Mining, Idea Group Reference, pp.397-402, 2005. ⟨hal-00383757v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-ST-ETIENNE IOGS CNRS UNIV-LYON2 LAHC PARISTECH ERIC UDL ANR

340 Consultations

5949 Téléchargements

Discretization of Continuous Attributes

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager