Combining Bayesian inference and clustering for transport mode detection from sparse and noisy geolocation data

Danya Bachir 1, 2, 3 Ghazaleh Khodabandelou 1, 2 Vincent Gauthier 1, 2 Mounim El Yacoubi 4, 5 Eric Vachon 6
1 R3S-SAMOVAR - Réseaux, Systèmes, Services, Sécurité
SAMOVAR - Services répartis, Architectures, MOdélisation, Validation, Administration des Réseaux
5 ARMEDIA-SAMOVAR - ARMEDIA
SAMOVAR - Services répartis, Architectures, MOdélisation, Validation, Administration des Réseaux
Abstract : Large-scale and real-time transport mode detection is an open challenge for smart transport research. Although massive mobility data is collected from smartphones, mining mobile network geolocation is non-trivial as it is a sparse, coarse and noisy data for which real transport labels are unknown. In this study, we process billions of Call Detail Records from the Greater Paris and present the first method for transport mode detection of any traveling device. Cellphones trajectories, which are anonymized and aggregated, are constructed as sequences of visited locations, called sectors. Clustering and Bayesian inference are combined to estimate transport probabilities for each trajectory. First, we apply clustering on sectors. Features are constructed using spatial information from mobile networks and transport networks. Then, we extract a subset of 15% sectors, having road and rail labels (e.g., train stations), while remaining sectors are multi-modal. The proportion of labels per cluster is used to calculate transport probabilities given each visited sector. Thus, with Bayesian inference, each record updates the transport probability of the trajectory, without requiring the exact itinerary. For validation, we use the travel survey to compare daily average trips per user. With Pearson correlations reaching 0.96 for road and rail trips, the model appears performant and robust to noise and sparsity
Complete list of metadatas

Cited literature [3 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01939608
Contributor : Danya Bachir <>
Submitted on : Thursday, February 21, 2019 - 10:45:33 AM
Last modification on : Monday, June 17, 2019 - 5:06:04 PM
Long-term archiving on : Wednesday, May 22, 2019 - 12:50:08 PM

File

sub_77.pdf
Files produced by the author(s)

Identifiers

Citation

Danya Bachir, Ghazaleh Khodabandelou, Vincent Gauthier, Mounim El Yacoubi, Eric Vachon. Combining Bayesian inference and clustering for transport mode detection from sparse and noisy geolocation data. ECML PKDD 2018: Machine Learning and Knowledge Discovery in Databases, Sep 2018, Dublin, Ireland. pp.569-584, ⟨10.1007/978-3-030-10997-4_35⟩. ⟨hal-01939608⟩

Share

Metrics

Record views

77

Files downloads

97