Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

CALDERA: Finding all significant de Bruijn subgraphs for bacterial GWAS

Abstract : Genome wide association studies (GWAS), aiming to find genetic variants associated with a trait, have widely been used on bacteria to identify genetic determinants of drug resistance or hypervirulence. Recent bacterial GWAS methods usually rely on k -mers, whose presence in a genome can denote variants ranging from single nucleotide polymorphisms to mobile genetic elements. Since many bacterial species include genes that are not shared among all strains, this approach avoids the reliance on a common reference genome. However, the same gene can exist in slightly different versions across different strains, leading to diluted effects when trying to detect its association to a phenotype through k -mer based GWAS. Here we propose to overcome this by testing covariates built from closed connected subgraphs of the De Bruijn graph defined over genomic k -mers. These covariates are able to capture polymorphic genes as a single entity, improving k -mer based GWAS in terms of power and interpretability. As the number of subgraphs is exponential in the number of nodes in the DBG, a method naively testing all possible subgraphs would result in very low statistical power due to multiple testing corrections, and the mere exploration of these subgraphs would quickly become computationally intractable. The concept of testable hypothesis has successfully been used to address both problems in similar contexts. We leverage this concept to test all closed connected subgraphs by proposing a novel enumeration scheme for these objects which fully exploits the pruning opportunity offered by testability, resulting in drastic improvements in computational efficiency. We illustrate this on both real and simulated datasets and also demonstrate how considering subgraphs leads to a more powerful and interpretable method. Our method integrates with existing visual tools to facilitate interpretation. We also provide an implementation of our method, as well as code to reproduce all results at https://github.com/HectorRDB/Caldera_Recomb .
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03433563
Contributor : Laurent Jacob Connect in order to contact the contributor
Submitted on : Wednesday, November 17, 2021 - 6:44:39 PM
Last modification on : Tuesday, May 17, 2022 - 2:50:02 PM

Links full text

Identifiers

Collections

Citation

Hector Roux de Bézieux, Leandro Lima, Fanny Perraudeau, Arnaud Mary, Sandrine Dudoit, et al.. CALDERA: Finding all significant de Bruijn subgraphs for bacterial GWAS. 2021. ⟨hal-03433563⟩

Share

Metrics

Record views

23