Statistical inference and data mining: false discoveries control - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2006

Statistical inference and data mining: false discoveries control

Résumé

Data Mining is characterised by its ability at processing large amounts of data. Among those are the data ”features”- variables or association rules that can be derived from them. Selecting the most interesting features is a classical data mining problem. That selection requires a large number of tests from which arise a number of false discoveries. An original non parametric control method is proposed in this paper. A new criterion, UAFWER, defined as the risk of exceeding a pre-set number of false discoveries, is controlled by BS FD, a bootstrap based algorithm that can be used on one- or two-sided problems. The usefulness of the procedure is illustrated by the selection of differentially interesting association rules on genetic data.
Fichier principal
Vignette du fichier
compstat.pdf (185.65 Ko) Télécharger le fichier
Loading...

Dates et versions

hal-00113593 , version 1 (13-11-2006)

Identifiants

  • HAL Id : hal-00113593 , version 1

Citer

Stéphane Lallich, Olivier Teytaud, Elie Prudhomme. Statistical inference and data mining: false discoveries control. 2006, 12 p. ⟨hal-00113593⟩
338 Consultations
450 Téléchargements

Partager

Gmail Facebook X LinkedIn More