Skip to Main content Skip to Navigation
Journal articles

Co-Clustering of Multivariate Functional Data for the Analysis of Air Pollution in the South of France

Charles Bouveyron 1, 2 Julien Jacques 3 Amandine Schmutz 4 Fanny Simoes 5, 6 Silvia Bottini 5, 6 
2 MAASAI - Modèles et algorithmes pour l’intelligence artificielle
CRISAM - Inria Sophia Antipolis - Méditerranée , UNS - Université Nice Sophia Antipolis (1965 - 2019), JAD - Laboratoire Jean Alexandre Dieudonné, Laboratoire I3S - SPARKS - Scalable and Pervasive softwARe and Knowledge Systems
Abstract : Nowadays, air pollution is a major treat for public health, with clear links with many diseases, especially cardiovascular ones. The spatio-temporal study of pollution is of great interest for governments and local authorities when deciding for public alerts or new city policies against pollution raise. The aim of this work is to study spatio-temporal profiles of environmental data collected in the south of France (Région Sud) by the public agency AtmoSud. The idea is to better understand the exposition to pollutants of inhabitants on a large territory with important differences in term of geography and urbanism. The data gather the recording of daily measurements of five environmental variables, namely three pollutants (PM10, NO2, O3) and two meteorological factors (pressure and temperature) over six years. Those data can be seen as multivariate functional data: quantitative entities evolving along time, for which there is a growing need of methods to summarize and understand them. For this purpose, a novel co-clustering model for multivariate functional data is defined. The model is based on a functional latent block model which assumes for each co-cluster a probabilistic distribution for multivariate functional principal component scores. A Stochastic EM algorithm, embedding a Gibbs sampler, is proposed for model inference, as well as a model selection criteria for choosing the number of co-clusters. The application of the proposed co-clustering algorithm on environmental data of the Région Sud allowed to divide the region composed by 357 zones in six macro-areas with common exposure to pollution. We showed that pollution profiles vary accordingly to the seasons and the patterns are conserved during the 6 years studied. These results can be used by local authorities to develop specific programs to reduce pollution at the macro-area level and to identify specific periods of the year with high pollution peaks in order to set up specific prevention programs for health. Overall, the proposed co-clustering approach is a powerful resource to analyse multivariate functional data in order to identify intrinsic data structure and summarize variables profiles over long periods of time.
Complete list of metadata
Contributor : Julien Jacques Connect in order to contact the contributor
Submitted on : Tuesday, September 14, 2021 - 11:38:57 AM
Last modification on : Friday, August 5, 2022 - 3:44:12 PM


Files produced by the author(s)


  • HAL Id : hal-02862177, version 2


Charles Bouveyron, Julien Jacques, Amandine Schmutz, Fanny Simoes, Silvia Bottini. Co-Clustering of Multivariate Functional Data for the Analysis of Air Pollution in the South of France. Annals of Applied Statistics, Institute of Mathematical Statistics, In press. ⟨hal-02862177v2⟩



Record views


Files downloads