Statistical inference and data mining: false discoveries control

Stéphane Lallich 1 Olivier Teytaud 2 Elie Prudhomme 1
2 TANC - Algorithmic number theory for cryptology
LIX - Laboratoire d'informatique de l'École polytechnique [Palaiseau], Inria Saclay - Ile de France, Polytechnique - X, CNRS - Centre National de la Recherche Scientifique : UMR7161
Abstract : Data Mining is characterised by its ability at processing large amounts of data. Among those are the data ”features”- variables or association rules that can be derived from them. Selecting the most interesting features is a classical data mining problem. That selection requires a large number of tests from which arise a number of false discoveries. An original non parametric control method is proposed in this paper. A new criterion, UAFWER, defined as the risk of exceeding a pre-set number of false discoveries, is controlled by BS FD, a bootstrap based algorithm that can be used on one- or two-sided problems. The usefulness of the procedure is illustrated by the selection of differentially interesting association rules on genetic data.
Type de document :
Communication dans un congrès
IASC. 2006, Springer-Verlag, 12 p., 2006
Liste complète des métadonnées

Littérature citée [19 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-00113593
Contributeur : Olivier Teytaud <>
Soumis le : lundi 13 novembre 2006 - 21:03:26
Dernière modification le : jeudi 11 janvier 2018 - 06:22:14
Document(s) archivé(s) le : mardi 6 avril 2010 - 22:30:13

Fichier

Identifiants

  • HAL Id : hal-00113593, version 1

Collections

Citation

Stéphane Lallich, Olivier Teytaud, Elie Prudhomme. Statistical inference and data mining: false discoveries control. IASC. 2006, Springer-Verlag, 12 p., 2006. 〈hal-00113593〉

Partager

Métriques

Consultations de la notice

410

Téléchargements de fichiers

296