Compensating for population sampling in simulations of epidemic spread on temporal contact networks

Abstract : Data describing human interactions often suffer from incomplete sampling of the underlying population. As a consequence, the study of contagion processes using data-driven models can lead to a severe underestimation of the epidemic risk. Here we present a systematic method to correct this bias and obtain an accurate estimation of the risk in the context of epidemic models informed by high-resolution time-resolved contact data. We consider several such data sets collected in various contexts and perform controlled resampling experiments. We show that the statistical information contained in the resampled data allows us to build surrogate versions of the unknown contacts and that simulations of epidemic processes using these surrogate data sets yield good estimates of the outcome of simulations performed using the complete data set. We discuss limitations and potential improvements of our method. Human interactions play an important role in determining the potential transmission routes of infectious diseases and other contagion phenomena [1]. Their measure and characterisation thus represent an invaluable contribution notably to the study of transmissible diseases [2]. In this context, the use of surveys and diaries in which volunteer participants record their encounters [3–8] have provided crucial insight, despite the memory biases inherent in self-reporting procedures [4, 9, 10]. Moreover, new approaches have emerged to measure contact patterns between individuals, using wearable sensors that can detect the proximity of other similar devices [11–20]. Data gathering efforts have produced data sets describing the contact patterns between individuals in various contexts in the form of temporal networks [15, 17, 21–24]: nodes represent individuals and, at each time step, a link is drawn between pairs of individuals who are in contact [25]. Such data can inform models of epidemic spreading phenomena to evaluate epidemic risks and mitigation strategies [15, 22, 26–31]. However, most data sets suffer from population sampling: despite efforts to maximise participation, for instance through scientific engagement of participants [24, 32], not all individuals accept to participate. Hence, the collected data only contains information on contacts occurring among a fraction of the population under study. Population sampling is known to affect the properties of static networks [33, 34]: Various statistical properties of a sampled network may differ from those of the complete system under scrutiny [35], and several works have focused on inferring network statistics from the knowledge of incomplete, sampled network data [36–39]. Both structural and temporal properties of time-varying networks are as well affected by sampling effects. In addition, a crucial though poorly studied consequence of population sampling is that simulations of dy-namical processes in data-driven models can be affected. For instance, in simulations of epidemic spreading, excluded nodes are by definition unreachable and thus equivalent to immunised nodes. Due to herd vaccination effects, the outcome of simulations of epidemic models on sampled networks is thus underestimated. How to estimate the outcome of dynamical processes on contact networks using incomplete data remains an open question. Here we tackle this issue for incompletely sampled data describing networks of human face-to-face interactions. We do not aim at inferring the true sequence of missing contacts but at estimating the outcome of simulations of models of epidemic spread in the whole population. To this effect, we resample available data sets by excluding at random a fraction of the individuals (nodes of the contact network), measure how resampling affects relevant network statistics and show that some crucial properties are stable under resampling. We exploit this stability to present a systematic method to construct surro-gate contact sequences for the excluded nodes, using only information available in the resampled data. We show that the outcome of simulations performed on the reconstructed data sets, obtained by the union of the resam-pled and surrogate contacts, reproduces results obtained on the complete data set, while using only the resampled data severely underestimates the epidemic risk. We show the efficiency of our procedure for data collected in three widely different contexts: a conference, a high school and a workplace.
Complete list of metadatas

Cited literature [48 references]  Display  Hide  Download
Contributor : Alain Barrat <>
Submitted on : Monday, March 16, 2015 - 11:26:37 AM
Last modification on : Thursday, June 21, 2018 - 9:14:04 AM
Long-term archiving on : Monday, April 17, 2017 - 2:50:30 PM


Files produced by the author(s)


  • HAL Id : hal-01131855, version 1
  • ARXIV : 1503.04066



Mathieu Génois, Christian Vestergaard, Ciro Cattuto, Alain Barrat. Compensating for population sampling in simulations of epidemic spread on temporal contact networks. Nature Communications, Nature Publishing Group, 2015, 6, pp.8860. ⟨⟩. ⟨hal-01131855⟩



Record views


Files downloads