Skip to Main content Skip to Navigation
Conference papers

Non-Disjoint Clustered Representation for Distributions over a Population of Cells (poster)

Matthieu Pichené 1 Sucheendra Palaniappan 1 Eric Fabre 1 Blaise Genest 1
1 SUMO - SUpervision of large MOdular and distributed systems
Inria Rennes – Bretagne Atlantique , IRISA-D4 - LANGAGE ET GÉNIE LOGICIEL
Abstract : We consider a large homogenous population of cells, where each cell is governed by the same complex biological pathway. A good modeling of the inherent variability of biological species is of crucial importance to the understanding of how the population evolves. In this work, we handle this variability by considering multivariate distributions, where each species is a random variable. Usually, the number of species in a pathway-and thus the number of variables-is high. This appealing approach thus quickly faces the curse of dimensionality: representing exactly the distribution of a large number of variables is intractable. To make this approach tractable, we explore different techniques to approximate the original joint distribution by meaningful and tractable ones. The idea is to consider families of joint probability distributions on large sets of random variables that admit a compact representation, and then select within this family the one that best approximates the desired intractable one. Natural measures of approximation accuracy can be derived from information theory. We compare several representations over distributions of populations of cells obtained from several fine-grained models of pathways (e.g. ODEs). We also explore the interest of such approximate distributions for approximate inference algorithms [1, 2] for coarse-grained abstractions of biological pathways [3]. 2 Results Our approximation scheme is to drop most correlations between variables. Indeed , when many variables are conditionally independent, the multivariate distribution can be compactly represented. The key is to keep the most relevant correlations, evaluated using the mutual information (MI) between two variables. The simplest approximation is called fully factored (FF), and assumes that all the variables are independent. It leads to very compact representation and fast computations, but it also leads to fairly inaccurate results as correlations between variables are entirely lost, even for highly correlated species (MI = 0.6). Alternately, one can preserve a few of the strongest correlations, selected using MI, giving rise to a set of disjoint clusters of variables. For efficiency reason, we used clusters of size two. This model was able to capture some of the most significant correlations between pairs of variables (representing around 30% of the total MI), but dropped significant ones (MI = 0.2).
Document type :
Conference papers
Complete list of metadata

Cited literature [3 references]  Display  Hide  Download
Contributor : Blaise Genest <>
Submitted on : Saturday, October 28, 2017 - 9:59:35 AM
Last modification on : Thursday, January 7, 2021 - 4:35:55 PM
Long-term archiving on: : Monday, January 29, 2018 - 2:40:39 PM


Files produced by the author(s)


  • HAL Id : hal-01625665, version 1


Matthieu Pichené, Sucheendra Palaniappan, Eric Fabre, Blaise Genest. Non-Disjoint Clustered Representation for Distributions over a Population of Cells (poster). CMSB 2017, 2017, Darmstadt, Germany. pp.324-326. ⟨hal-01625665⟩



Record views


Files downloads