Estimation of cosmological parameters using adaptive importance sampling

We present a Bayesian sampling algorithm called adaptive importance sampling or Population Monte Carlo (PMC), whose computational workload is easily parallelizable and thus has the potential to considerably reduce the wall-clock time required for sampling, along with providing other benefits. To assess the performance of the approach for cosmological problems, we use simulated and actual data consisting of CMB anisotropies, supernovae of type Ia, and weak cosmological lensing, and provide a comparison of results to those obtained using state-of-the-art Markov Chain Monte Carlo (MCMC). For both types of data sets, we find comparable parameter estimates for PMC and MCMC, with the advantage of a significantly lower computational time for PMC. In the case of WMAP5 data, for example, the wall-clock time reduces from several days for MCMC to a few hours using PMC on a cluster of processors. Other benefits of the PMC approach, along with potential difficulties in using the approach, are analysed and discussed.

Objectives of the project: Combine three deep surveys of the universe to set new constraints on the evolution scenario of galaxies and large scale structures, and the fundamental cosmological parameters.
Example of survey: WMAP (or Planck) for the Cosmic Microwave Background (CMB) radiations = temperature variations are related to fluctuations in the density of matter in the early universe and thus carry out information about the initial conditions for the formation of cosmic structures such as galaxies, clusters and voids for example. Some questions in cosmology will the universe expand for ever, or will it collapse? what is the shape of the universe? Is the expansion of the universe accelerating rather than decelerating? Is the universe dominated by dark matter and what is its concentration? explained by some cosmologic parameters

Model (II)
This yields: a likelihood of the data given the parameters: some of them computed from publicly available codes ex. WMAP5 code for CMB data combined with a priori knowledge: uniform prior on hypercubes.
Therefore, statistical inference consists in the exploration of the a posteriori density of the parameters, a challenging task due to potentially high dimensional parameter space (not really considered here: sampling in R d , d ∼ 10 to 15) immensely slow computation of likelihoods, non-linear dependence and degeneracies between parameters introduced by physical constraints or theoretical assumptions.
Monte Carlo algorithms for the exploration of the a posteriori density π (naive) Monte Carlo methods: i.i.d. samples under π. Here, NO: π is only known through a "numerical box" Markov chain Monte Carlo methods: a Markov chain with stationary distribution π · · · Importance sampling or MCMC?
All of these sampling techniques, require time consuming evaluations of the a posteriori distribution π for each new draw Importance sampling: allow for parallel computation. MCMC: can not be parallelized. well, say, most of them The efficiency of these sampling techniques depend on design parameters Importance sampling: the proposal distribution. Hastings-Metropolis type MCMC: the proposal distribution.
→ towards adaptive algorithms that learn on the fly how to modify the value of the design parameters.
Monitoring convergence Importance sampling: criteria such as Effective Sample Size (ESS) or the Normalized Perplexity. MCMC: no such explicit criterion.
Therefore, we decided to run an adaptive Importance Sampling algorithm: Population Monte Carlo [Robert et al. 2005] compare it to an adaptive MCMC algorithm: Adaptive Metropolis algorithm [Haario et al. 1999] Population Monte Carlo (PMC) algorithm Idea: choose the best proposal distribution among a set of (parametric) distributions. Criterion based on the Kullback-Leibler divergence In order to have a / to approximate the solution of this optimization problem

Population Monte Carlo (PMC) algorithm (II)
Iterative algorithm: initialization: choose an initial proposal distribution q (0) and draw weighted points {(w k ,X k )} k that approximate π Based on these samples, update the proposal distribution q (1) = argmax q∈Q n k=1 ω k n j=1 ω j log q(X k ) and draw weighted points {(w k ,X k )} k that approximate π.
Repeat until · · · further adaptations do not result in significant improvements of the KL divergence.

Adaptive Metropolis
Symmetric Random Walk Metropolis algorithm with Gaussian proposal distribution, with "mysterious" (but famous) scaling matrix where Σ π is the unknown covariance matrix of π. [Roberts et al. 1997] "unknown"?! estimate it on the fly, from the samples of the algorithm −→ adaptive Metropolis algorithm Simulations on 1 simulated data, from a "banana" density 2 real data.

Simulated data
The target distribution in R 10 . Below marginal distribution of (x 1 ,x 2 )