Incomplete graphical model inference via latent tree aggregation

Abstract : Graphical network inference is used in many fields such as genomics or ecology to infer the conditional independence structure between variables, from measurements of gene expression or species abundances for instance. In many practical cases, not all variables involved in the network have been observed, and the samples are actually drawn from a distribution where some variables have been marginalized out. This challenges the sparsity assumption commonly made in graphical model inference, since marginalization yields locally dense structures, even when the original network is sparse. We present a procedure for inferring Gaussian graphical models when some variables are unobserved, that accounts both for the influence of missing variables and the low density of the original network. Our model is based on the aggregation of spanning trees, and the estimation procedure on the Expectation-Maximization algorithm. We treat the graph structure and the unobserved nodes as missing variables and compute posterior probabilities of edge appearance. To provide a complete methodology, we also propose several model selection criteria to estimate the number of missing nodes. A simulation study and an illustration flow cytometry data reveal that our method has favorable edge detection properties compared to existing graph inference techniques. The methods are implemented in an R package.
Contributor : Geneviève Robin <>
Submitted on : Friday, January 19, 2018 - 3:19:38 PM
Last modification on : Friday, April 19, 2019 - 4:55:31 PM
  • HAL Id : hal-01686841, version 1


Geneviève Robin, Christophe Ambroise, Stephane Robin. Incomplete graphical model inference via latent tree aggregation. 2018. ⟨hal-01686841⟩



