Inférence de réseaux d'interaction protéine-protéine par apprentissage statistique

Abstract : The aim of this thesis is to develop tools for predicting interactions between proteins that can be applied to the human proteins forming a network with the CFTR protein. This protein, when defective, is involved in cystic fibrosis. The development of in silico prediction methods can be useful for biologists to suggest new interaction targets and to better explain the proteins' functions in the network. We propose a new method to solve the link prediction problem. To benefit from the information of unlabeled data, we place ourselves in the semi-supervised learning framework. Link prediction is addressed as an output kernel learning task, referred as Output Kernel Regression. An output kernel is assumed to encode the proximities of nodes in the target graph and the goal is to approximate this kernel by using appropriate input features. Using the kernel trick in the output space allows one to reduce the problem of learning from pairs to learning a single variable function with output values in a Hilbert space. By choosing candidates for regression functions in a reproducing kernel Hilbert space with operator valued kernels, we develop tools for regularization as for scalar-valued functions. We establish representer theorems in the supervised and semi-supervised cases and use them to define new regression models for different cost functions, called IOKR-ridge and IOKR-margin. We first tested the developed approach on transductive link prediction using artificial data, benchmark data as well as a protein-protein interaction network of the yeast S. Cerevisiae and we obtained very good results. Then we applied it to the prediction of protein interactions in a network built around the CFTR protein.
Complete list of metadatas

https://tel.archives-ouvertes.fr/tel-00845692
Contributor : Céline Brouard <>
Submitted on : Wednesday, July 17, 2013 - 3:39:18 PM
Last modification on : Monday, October 28, 2019 - 10:50:21 AM
Long-term archiving on: Wednesday, April 5, 2017 - 1:23:11 PM

Identifiers

  • HAL Id : tel-00845692, version 1

Collections

Citation

Céline Brouard. Inférence de réseaux d'interaction protéine-protéine par apprentissage statistique. Apprentissage [cs.LG]. Université d'Evry-Val d'Essonne, 2013. Français. ⟨NNT : 2013EVRY0006⟩. ⟨tel-00845692⟩

Share

Metrics

Record views

534

Files downloads

2622