A Data Mining Approach to Highlight Relations Between Functional Modules - Archive ouverte HAL Accéder directement au contenu
Poster De Conférence Année : 2010

A Data Mining Approach to Highlight Relations Between Functional Modules

Résumé

We propose a data-mining approach to work on large graphs with set of labels associated to vertices. This type of data fits well with biological datasets, for example, a protein/protein interaction graph where each protein is labeled with the biological situations in which the corresponding gene is over-expressed. Previous works on this type of dataset focus on finding functional modules, i.e., set of strongly interacting proteins over-expressed in the same biological situations. Our main originality considering previous work on this type of dataset consists in finding collections of functional modules. We introduce the problem of extracting Maximal Homogeneous Clique Set (MHCS) which are set of cliques (i.e., complete subgraph) satisfying constraints on the number of labels shared by all the vertices and on the number of separated cliques having a minimal size. This pattern definition may highlight relations between densely connected subgraphs, for example it may exhibit genes being at the bridge of different functional modules. It might also shows up groups of proteins for which we have no evidences of interaction and where their coding genes are all over-expressed in the same biological situations. This kind of patterns might suggests investigations for potential interactions. Clearly, a naive enumeration of all MHCS in a dataset is intractable in practice, it is thus necessary to develop an efficient algorithm taking into account constraints properties. Using monotonic and anti-monotonic constraints properties, we were able to propose a complete algorithm, and we experimentally shown that it scales well on graphs with hundreds thousand vertices. We also performed experiments on a biological dataset built upon STRING, a protein/protein interactions database and SQUAT, a binarized gene expression database containing results from SAGE experiments. On this dataset we looked for MHCSs formed by at least 2 separated cliques having 3 genes, and where all genes are over-expressed in 3 common biological situations. Using those parameters we found groups of genes that are functionally unrelated except for a transcription factor (CRX) activating genes related to eye development and vision that fall in different functional categories.
Fichier non déposé

Dates et versions

hal-01381584 , version 1 (14-10-2016)

Identifiants

  • HAL Id : hal-01381584 , version 1

Citer

Pierre-Nicolas Mougel, Marc Plantevit, Christophe Rigotti, Olivier Gandrillon, Jean-François Boulicaut. A Data Mining Approach to Highlight Relations Between Functional Modules. IPG (Integrative Post-Genomics), Nov 2010, Lyon, France. pp.1-1, 2010. ⟨hal-01381584⟩
99 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More