Non-Disjoint Clustered Representation for Distributions over a Population of Cells (poster)

Matthieu Pichené 1 Sucheendra Palaniappan 1 Eric Fabre 1 Blaise Genest 1
1 SUMO - SUpervision of large MOdular and distributed systems
Inria Rennes – Bretagne Atlantique , IRISA_D4 - LANGAGE ET GÉNIE LOGICIEL
Abstract : We consider a large homogenous population of cells, where each cell is governed by the same complex biological pathway. A good modeling of the inherent variability of biological species is of crucial importance to the understanding of how the population evolves. In this work, we handle this variability by considering multivariate distributions, where each species is a random variable. Usually, the number of species in a pathway-and thus the number of variables-is high. This appealing approach thus quickly faces the curse of dimensionality: representing exactly the distribution of a large number of variables is intractable. To make this approach tractable, we explore different techniques to approximate the original joint distribution by meaningful and tractable ones. The idea is to consider families of joint probability distributions on large sets of random variables that admit a compact representation, and then select within this family the one that best approximates the desired intractable one. Natural measures of approximation accuracy can be derived from information theory. We compare several representations over distributions of populations of cells obtained from several fine-grained models of pathways (e.g. ODEs). We also explore the interest of such approximate distributions for approximate inference algorithms [1, 2] for coarse-grained abstractions of biological pathways [3]. 2 Results Our approximation scheme is to drop most correlations between variables. Indeed , when many variables are conditionally independent, the multivariate distribution can be compactly represented. The key is to keep the most relevant correlations, evaluated using the mutual information (MI) between two variables. The simplest approximation is called fully factored (FF), and assumes that all the variables are independent. It leads to very compact representation and fast computations, but it also leads to fairly inaccurate results as correlations between variables are entirely lost, even for highly correlated species (MI = 0.6). Alternately, one can preserve a few of the strongest correlations, selected using MI, giving rise to a set of disjoint clusters of variables. For efficiency reason, we used clusters of size two. This model was able to capture some of the most significant correlations between pairs of variables (representing around 30% of the total MI), but dropped significant ones (MI = 0.2).
Type de document :
Communication dans un congrès
CMSB 2017, 2017, Darmstadt, Germany. Springer, LNCS/LNBI (10545), pp.324-326, 2017, CMSB 2017 - 15th International Conference on Computational Methods in Systems Biology
Liste complète des métadonnées

Littérature citée [3 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01625665
Contributeur : Blaise Genest <>
Soumis le : samedi 28 octobre 2017 - 09:59:35
Dernière modification le : mercredi 16 mai 2018 - 11:24:13
Document(s) archivé(s) le : lundi 29 janvier 2018 - 14:40:39

Fichier

PPFG17.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01625665, version 1

Citation

Matthieu Pichené, Sucheendra Palaniappan, Eric Fabre, Blaise Genest. Non-Disjoint Clustered Representation for Distributions over a Population of Cells (poster). CMSB 2017, 2017, Darmstadt, Germany. Springer, LNCS/LNBI (10545), pp.324-326, 2017, CMSB 2017 - 15th International Conference on Computational Methods in Systems Biology. 〈hal-01625665〉

Partager

Métriques

Consultations de la notice

300

Téléchargements de fichiers

33