Statistical inference and data mining: false discoveries control

Stéphane Lallich; Olivier Teytaud; Elie Prudhomme

Communication Dans Un Congrès Année : 2006

Statistical inference and data mining: false discoveries control

(1) , (2) , (1)

1
2

Stéphane Lallich

Fonction : Auteur

Equipe de Recherche en Ingénierie des Connaissances

Olivier Teytaud

Fonction : Auteur
PersonId : 581
IdHAL : olivier-teytaud
IdRef : 05971008X

Algorithmic number theory for cryptology

Elie Prudhomme

Fonction : Auteur

Equipe de Recherche en Ingénierie des Connaissances

Résumé

Data Mining is characterised by its ability at processing large amounts of data. Among those are the data ”features”- variables or association rules that can be derived from them. Selecting the most interesting features is a classical data mining problem. That selection requires a large number of tests from which arise a number of false discoveries. An original non parametric control method is proposed in this paper. A new criterion, UAFWER, defined as the risk of exceeding a pre-set number of false discoveries, is controlled by BS FD, a bootstrap based algorithm that can be used on one- or two-sided problems. The usefulness of the procedure is illustrated by the selection of differentially interesting association rules on genetic data.

Domaines

Biotechnologie Informatique

Fichier principal

compstat.pdf (185.65 Ko)

Olivier Teytaud : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00113593

Soumis le : lundi 13 novembre 2006-21:03:26

Dernière modification le : vendredi 24 mars 2023-14:52:48

Archivage à long terme le : mardi 6 avril 2010-22:30:13

Dates et versions

hal-00113593 , version 1 (13-11-2006)

Identifiants

HAL Id : hal-00113593 , version 1

Citer

Stéphane Lallich, Olivier Teytaud, Elie Prudhomme. Statistical inference and data mining: false discoveries control. 2006, 12 p. ⟨hal-00113593⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

X CNRS INRIA UNIV-LYON2 LIX X-LIX X-DEP-INFO PARISTECH ERIC INRIA2 UDL

338 Consultations

450 Téléchargements

Statistical inference and data mining: false discoveries control

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager