Skip to Main content Skip to Navigation
Conference papers

Modeling genetical data with forests of latent trees for applications in association genetics at a large scale. Which clustering method should be chosen?

Abstract : Association genetics, and in particular genome-wide association studies (GWASs), aim at elucidating the etiology of complex genetic diseases. In the domain of association genetics, machine learning provides an appealing alternative framework to standard statistical approaches. Pioneering works (Mourad et al., 2011) have proposed the forest of latent trees (FLTM) to model genetical data at the genome scale. The FLTM is a hierarchical Bayesian network with latent variables. A key to FLTM construction is the recursive clustering of variables, in a bottom up subsuming process. In this paper, we study the impact of the choice of the clustering method to be plugged in the FLTM learning algorithm, in a GWAS context. Using a real GWAS data set describing 41400 variables for each of 3004 controls and 2005 individuals affected by Crohn's disease, we compare the influence of three clustering methods. Data dimension reduction and ability to split or group putative causal SNPs in agreement with the underlying biological reality are analyzed. To assess the risk of missing significant association results through subsumption, we also compare the methods through the corresponding FLTM-driven GWASs. In the GWAS context and in this framework, the choice of the clustering method does not impact the satisfying performance of the downstream application, both in power and detection of false positive associations.
Document type :
Conference papers
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-01084907
Contributor : Christine Sinoquet <>
Submitted on : Thursday, November 20, 2014 - 12:29:58 PM
Last modification on : Thursday, January 17, 2019 - 10:40:04 AM

Identifiers

  • HAL Id : hal-01084907, version 1

Collections

Citation

Duc-Thanh Phan, Philippe Leray, Christine Sinoquet. Modeling genetical data with forests of latent trees for applications in association genetics at a large scale. Which clustering method should be chosen?. International Conference on Bioinformatics Models, Methods and Algorithms, Bioinformatics2015, Nov 2014, Lisbon, Portugal. pp.12. ⟨hal-01084907⟩

Share

Metrics

Record views

193