Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

Methodology for Adding a Variable to a Synthetic Population from Aggregate Data: Example of the Income Variable

Abstract : This paper presents a framework to tackle the problem, which has received little attention in the literature, of adding variables to a synthetic population from aggregate data. The work herein thus enriches the existing literature by proposing a new and e icient methodology to meet this practical need. The methodology integrates three distinct stages, the first of which theoretically models the problem as a multinomial distribution. The addition of a new variable is formulated as an entropy maximization using the variables available in both the synthetic population and aggregate data. Solving this problem (in our specific case study) is not possible due to the large number of constraints involved. The second stage then presents a heuristic yielding a practical solution to the problem. This heuristic combines Bayes' theorem with the cross-entropy minimization algorithm. However, given the large number of parameters to be estimated by the proposed heuristic, some of the results obtained prove to be invalid. To rectify this shortcoming, a post-processing method is applied during a third stage to ensure the consistency of our results. The methodology is described in great detail, and examples are provided for a better understanding of these three stages. Also, this methodology is applied to a real-world case study. An income is allocated to each of the 157 000 households in the French city of Nantes based on aggregate data from the FiLoSoFi database. Income constitutes an essential microsimulation variable for taking many social and economic aspects into account (e.g. household purchasing power, redistribution policy, tax policy). Special attention is also paid to the reproducibility of our results with the databases and R-scripts used, all of which are freely available. This method remains general and is indeed applicable to other variables with available aggregate data.
Complete list of metadata
Contributor : Pierre-Olivier Vandanjon <>
Submitted on : Thursday, July 8, 2021 - 5:04:47 PM
Last modification on : Tuesday, July 13, 2021 - 3:56:26 AM


Files produced by the author(s)


  • HAL Id : hal-03282111, version 1



Boyam Yaméogo, Pierre-Olivier Vandanjon, Pierre Hankach, Pascal Gastineau. Methodology for Adding a Variable to a Synthetic Population from Aggregate Data: Example of the Income Variable. 2021. ⟨hal-03282111⟩



Record views


Files downloads