A fast and flexible instance selection algorithm adapted to non-trivial database sizes - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Intelligent Data Analysis Année : 2015

A fast and flexible instance selection algorithm adapted to non-trivial database sizes

Résumé

In this paper, a new instance selection algorithm is proposed in the context of classification to manage non-trivial database sizes. The algorithm is hybrid and runs with only a few parameters that directly control the balance between the three objectives of classification, i.e. errors, storage requirements and runtime. It comprises different mechanisms involving neighborhood and stratification algorithms that specifically speed up the runtime without significantly degrading efficiency. Instead of applying an IS (Instance Selection) algorithm to the whole database, IS is applied to strata deriving from the regions, each region representing a set of patterns selected from the original training set. The application of IS is conditioned by the purity of each region (i.e. the extent to which different categories of patterns are mixed in the region) and the stratification strategy is adapted to the region components. For each region, the number of delivered instances is firstly limited via the use of an iterative process that takes into account the boundary complexity, and secondly optimized by removing the superfluous ones. The sets of instances determined from all the regions are put together to provide an intermediate instance set that undergoes a dedicated filtering process to deliver the final set. Experiments performed with various synthetic and real data sets demonstrate the advantages of the proposed approach.
Fichier principal
Vignette du fichier
mo2015-pub00044884.pdf (1.55 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01192999 , version 1 (04-09-2015)

Identifiants

Citer

F. Ros, Rachid Harba, M. Pintore, S. Guillaume. A fast and flexible instance selection algorithm adapted to non-trivial database sizes. Intelligent Data Analysis, 2015, 19 (3), pp.631-658. ⟨10.3233/IDA-150736⟩. ⟨hal-01192999⟩
221 Consultations
136 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More