OLOGRAM-MODL: mining enriched n-wise combinations of genomic features with Monte Carlo and dictionary learning - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue NAR Genomics and Bioinformatics Année : 2021

OLOGRAM-MODL: mining enriched n-wise combinations of genomic features with Monte Carlo and dictionary learning

Résumé

Most epigenetic marks, such as Transcriptional Regulators or histone marks, are biological objects known to work together in n-wise complexes. A suitable way to infer such functional associations between them is to study the overlaps of the corresponding genomic regions. However, the problem of the statistical significance of n-wise overlaps of genomic features is seldom tackled, which prevent rigorous studies of n-wise interactions. We introduce OLOGRAM-MODL, which considers overlaps between n ≥ 2 sets of genomic regions, and computes their statistical mutual enrichment by Monte Carlo fitting of a Negative Binomial distribution, resulting in more resolutive P-values. An optional machine learning method is proposed to find complexes of interest, using a new itemset mining algorithm based on dictionary learning which is resistant to noise inherent to biological assays. The overall approach is implemented through an easy-to-use CLI interface for workflow integration, and a visual treebased representation of the results suited for explicability. The viability of the method is experimentally studied using both artificial and biological data. This approach is accessible through the command line interface of the pygtftk toolkit, available on Bioconda and from https://github.com/dputhier/pygtftk
Fichier principal
Vignette du fichier
lqab114.pdf (1.31 Mo) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte

Dates et versions

hal-03549602 , version 1 (31-01-2022)

Identifiants

Citer

Quentin Ferré, Cécile Capponi, Denis Puthier. OLOGRAM-MODL: mining enriched n-wise combinations of genomic features with Monte Carlo and dictionary learning. NAR Genomics and Bioinformatics, 2021, 3 (4), ⟨10.1093/nargab/lqab114⟩. ⟨hal-03549602⟩
26 Consultations
36 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More