Modélisation de l'expression des gènes à partir de données de séquence ADN

Abstract : Gene expression is tightly controlled to ensure a wide variety of cell types and functions. The development of diseases, particularly cancers, is invariably related to deregulations of these controls. Our objective is to model the link between gene expression and nucleotide composition of different regulatory regions in the genome. We propose to address this problem in a regression framework using a Lasso approach coupled to a regression tree. We use exclusively sequence data and we fit a different model for each cell type. We show that (i) different regulatory regions provide particular and complementary information and that (ii) the only information contained in the nucleotide compositions allows predicting gene expression with an error comparable to that obtained using experimental data. Moreover, the fitted linear model is not as powerful for all genes, but better fit certain groups of genes with particular nucleotides compositions.
Document type :
Conference papers
Complete list of metadatas

Cited literature [6 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-02068289
Contributor : May Taha <>
Submitted on : Thursday, March 14, 2019 - 5:39:34 PM
Last modification on : Saturday, March 30, 2019 - 2:05:12 AM
Long-term archiving on : Saturday, June 15, 2019 - 8:18:04 PM

File

abstract_SFDS.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02068289, version 1

Collections

Citation

May Taha, Chloé Bessière, Florent Petitprez, Jimmy Vandel, Jean-Michel Marin, et al.. Modélisation de l'expression des gènes à partir de données de séquence ADN. JdS 2017, 49èmes Journées de Statistique de la SFdS, May 2017, Avignon, France. ⟨hal-02068289⟩

Share

Metrics

Record views

30

Files downloads

15