Identification de biomarqueurs et d’ARN non codants par des approches basées sur l’intelligence computationnelle

Abstract : Currently, cancer prevails as a prime health matter worldwide. Cancer clas- sification has traditionally been based on the morphological study of tumors. However, tumors with similar histological appearances can exhibit different responses to therapy, indicating differences in tumor characteristics on the molecular level. Thus, the development of a novel, reliable and accurate method for the classification of tumors is essential for more successful diag- nosis and treatment. Molecular biomarkers allow new ways of understanding disease processes and the manner in which medicines work to counteract dis- ease. In the last few years, researchers have dedicated growing attention to biomarker identification given due to its extreme importance in genomics and personalized medicine. In this thesis, we address the problem of biomarker discovery at two lev- els: genomics and transcriptomics. We are first interested in the problem of selecting robust and accurate signatures from gene expression data which re- lies heavily on the used feature selection algorithms. The main objective is to attempt high performance of computer-aided diagnosis (CAD), by selecting few genes with high predictive power and high sensibility to variations in real clinical tests. For that purpose, we have investigated ensemble-based methods and parallel cooperative metaheuristics which have received an increasing attention due to their power to give higher accuracy and stability than a sin- gle algorithm can achieve. Accordingly, we propose a parallel ensemble-based feature selection method based on meta-ensemble of filters (MPME-FS) for biomarker discovery from gene expression data. Then, we propose a hybrid wrapper/filter feature selection method based on the parallel cooperation of metaheuristics and a filter-based mechanism for both the initialization and the reparation of solutions, called CPM-FS. After that, we propose an ensemble-based wrapper gene selection method based on the previously proposed CPM-FS and a wrapper based consensus function in order to take into account genes dependencies. Experiments on 12 publicly available cancer datasets have shown that our approaches outperform recent state-of-the-art methods in term of the predictive performance. They also provide robust selection through the different similarity measures. Biological interpretation of the selected signature reveals that the proposed methods guarantee the selection of highly informative genes for cancer diagnosis. In a second part of this thesis, we propose an integrative approach for the prediction of noncoding RNAs, which are molecules with an important role in post-transcriptional gene regulation highlighting their importance as putative markers and their impact on the development and the progression of many diseases. In the proposed approach several types of genomic and epigenomic properties that can be used to characterize these molecules are examined. We have developed a generic tool called IncRId that allows tak- ing into account all reviewed heterogeneous features in a modular and easily extensible way and could be used and adapted for predicting any type of ncRNA. Our method makes it possible to study the validity of each given feature in each of the candidate species. Then, we present an application example by focusing on the prediction of piRNAs. We reviewed and ex- tracted a large number of piRNA features from the literature that have been observed experimentally in several species. We implemented these features in a tool, called IpiRId, to study the pertinence of each feature in each of the studied species: human, mouse and fly. IpiRId prediction results attain more than 90% accuracy, outperforming all existing tools. The IpiRId soft- ware and the web server of our tool are freely available to academic users at: https://EvryRNA.ibisc.univ-evry.fr Currently, cancer prevails as a prime health matter worldwide. Cancer clas- sification has traditionally been based on the morphological study of tumors. However, tumors with similar histological appearances can exhibit different responses to therapy, indicating differences in tumor characteristics on the molecular level. Thus, the development of a novel, reliable and accurate method for the classification of tumors is essential for more successful diag- nosis and treatment. Molecular biomarkers allow new ways of understanding disease processes and the manner in which medicines work to counteract dis- ease. In the last few years, researchers have dedicated growing attention to biomarker identification given due to its extreme importance in genomics and personalized medicine. In this thesis, we address the problem of biomarker discovery at two lev- els: genomics and transcriptomics. We are first interested in the problem of selecting robust and accurate signatures from gene expression data which re- lies heavily on the used feature selection algorithms. The main objective is to attempt high performance of computer-aided diagnosis (CAD), by selecting few genes with high predictive power and high sensibility to variations in real clinical tests. For that purpose, we have investigated ensemble-based methods and parallel cooperative metaheuristics which have received an increasing attention due to their power to give higher accuracy and stability than a sin- gle algorithm can achieve. Accordingly, we propose a parallel ensemble-based feature selection method based on meta-ensemble of filters (MPME-FS) for biomarker discovery from gene expression data. Then, we propose a hybrid wrapper/filter feature selection method based on the parallel cooperation of metaheuristics and a filter-based mechanism for both the initialization and the reparation of solutions, called CPM-FS. After that, we propose an ensemble-based wrapper gene selection method based on the previously proposed CPM-FS and a wrapper based consensus function in order to take into account genes dependencies. Experiments on 12 publicly available cancer datasets have shown that our approaches outperform recent state-of-the-art methods in term of the predictive performance. They also provide robust selection through the different similarity measures. Biological interpretation of the selected signature reveals that the proposed methods guarantee the selection of highly informative genes for cancer diagnosis. In a second part of this thesis, we propose an integrative approach for the prediction of noncoding RNAs, which are molecules with an important role in post-transcriptional gene regulation highlighting their importance as putative markers and their impact on the development and the progression of many diseases. In the proposed approach several types of genomic and epigenomic properties that can be used to characterize these molecules are examined. We have developed a generic tool called IncRId that allows tak- ing into account all reviewed heterogeneous features in a modular and easily extensible way and could be used and adapted for predicting any type of ncRNA. Our method makes it possible to study the validity of each given feature in each of the candidate species. Then, we present an application example by focusing on the prediction of piRNAs. We reviewed and ex- tracted a large number of piRNA features from the literature that have been observed experimentally in several species. We implemented these features in a tool, called IpiRId, to study the pertinence of each feature in each of the studied species: human, mouse and fly. IpiRId prediction results attain more than 90% accuracy, outperforming all existing tools. The IpiRId soft- ware and the web server of our tool are freely available to academic users at: https://EvryRNA.ibisc.univ-evry.fr
Document type :
Theses
Complete list of metadatas

Cited literature [201 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/tel-01769937
Contributor : Frédéric Davesne <>
Submitted on : Wednesday, April 18, 2018 - 2:34:29 PM
Last modification on : Monday, October 28, 2019 - 10:50:22 AM

File

Thèse_Anouar.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : tel-01769937, version 1

Citation

Anouar Boucheham. Identification de biomarqueurs et d’ARN non codants par des approches basées sur l’intelligence computationnelle. Bio-informatique [q-bio.QM]. Université Constantine 2 - Abdelhamid Mehri, 2016. Français. ⟨tel-01769937⟩

Share

Metrics

Record views

190

Files downloads

481