A screening methodology based on random forests to improve the detection of gene-gene interactions

Lizzy de Lobel; Pierre Geurts; Guy Baele; Francesc Castro-Giner; Manolis Kogevinas; Kristel van Steen

doi:10.1038/ejhg.2010.48

Article Dans Une Revue European Journal of Human Genetics Année : 2010

A screening methodology based on random forests to improve the detection of gene-gene interactions

(1) , (1) , (1) , (1) , (1) , (1)

Lizzy de Lobel

Fonction : Auteur correspondant
PersonId : 882590

Connectez-vous pour contacter l'auteur

Department of Applied Mathematics and Computer Science [Ghent]

Pierre Geurts

Fonction : Auteur

Department of Applied Mathematics and Computer Science [Ghent]

Guy Baele

Fonction : Auteur
PersonId : 761656
ORCID : 0000-0002-1915-7732

Department of Applied Mathematics and Computer Science [Ghent]

Francesc Castro-Giner

Fonction : Auteur

Department of Applied Mathematics and Computer Science [Ghent]

Manolis Kogevinas

Fonction : Auteur
PersonId : 762958
ORCID : 0000-0002-9605-0461

Department of Applied Mathematics and Computer Science [Ghent]

Kristel van Steen

Fonction : Auteur

Department of Applied Mathematics and Computer Science [Ghent]

Résumé

The search for susceptibility loci in gene-gene interactions imposes a methodological and computational challenge for statisticians due to the large dimensionality inherent to the modelling of gene-gene interactions or epistasis. In an era where genome-wide scans have become relatively common, new powerful methods are required to handle the huge amount of feasible gene-gene interactions and to weed out the false positives and negatives from these results. One solution to the dimensionality problem is to reduce the data by preliminary screening of markers to select the best candidates for further analysis. Ideally, this screening step is statistically independent of the testing phase. Initially developed for small numbers of markers, the Multifactor Dimensionality Reduction method is a nonparametric, model-free data reduction technique to associate sets of markers with optimal predictive properties to disease. In this study, we examine the power of Multifactor Dimensionality Reduction in larger datasets and compare it to other approaches that are able to identify gene-gene interactions. Under a variety of interaction models (purely and not purely epistatic), we use a Random Forests -based pre-screening method, before executing the Multifactor Dimensionality Reduction, to improve its performance. We find that the power of Multifactor Dimensionality Reduction increases when noisy SNPs are first removed by creating a collection of candidate markers with Random Forests. We validate our technique by extensive simulation studies and by application to asthma data from the ECRHS II study.

Mots clés

gene-gene interactions pre-screening Random Forests

Fichier principal

PEER_stage2_10.1038%2Fejhg.2010.48.pdf (227.28 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Hal Peer : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00535576

Soumis le : vendredi 12 novembre 2010-02:51:29

Dernière modification le : mardi 14 novembre 2023-11:58:06

Archivage à long terme le : samedi 3 décembre 2016-00:50:40

Dates et versions

hal-00535576 , version 1 (12-11-2010)

Identifiants

HAL Id : hal-00535576 , version 1
DOI : 10.1038/ejhg.2010.48

Citer

Lizzy de Lobel, Pierre Geurts, Guy Baele, Francesc Castro-Giner, Manolis Kogevinas, et al.. A screening methodology based on random forests to improve the detection of gene-gene interactions. European Journal of Human Genetics, 2010, ⟨10.1038/ejhg.2010.48⟩. ⟨hal-00535576⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

PEER

39 Consultations

90 Téléchargements

A screening methodology based on random forests to improve the detection of gene-gene interactions

Résumé

Mots clés

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager