Using Resampling Techniques for Better Quality Discretization - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2009

Using Resampling Techniques for Better Quality Discretization

Résumé

Many supervised induction algorithms require discrete data, however real data often comes in both discrete and continuous formats. Quality discretization of continuous attributes is an important problem that has effects on accuracy, complexity, variance and understandability of the induction model. Usually, discretization and other types of statistical processes are applied to subsets of the population as the entire population is practically inaccessible. For this reason we argue that the discretization performed on a sample of the population is only an estimate of the entire population. Most of the existing discretization methods, partition the attribute range into two or several intervals using a single or a set of cut points. In this paper, we introduce two variants of a resampling technique (such as bootstrap) to generate a set of candidate discretization points and thus, improving the discretization quality by providing a better estimation towards the entire population. Thus, the goal of this paper is to observe whether this type of resampling can lead to better quality discretization points, which opens up a new paradigm to construction of soft decision trees.
Fichier non déposé

Dates et versions

hal-00923582 , version 1 (03-01-2014)

Identifiants

  • HAL Id : hal-00923582 , version 1

Citer

Taimur Qureshi, Djamel Abdelkader Zighed. Using Resampling Techniques for Better Quality Discretization. 6th International Conference on Machine Learning and Data Mining (MLDM'09), Jul 2009, Leipzig, Germany. pp.68-81. ⟨hal-00923582⟩
128 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More