Anomaly detection in smart card logs and distant evaluation with Twitter: a robust framework

Emeric Tonnelier; Nicolas Baskiotis; Vincent Guigue; Patrick Gallinari

doi:10.1016/j.neucom.2017.12.067

Article Dans Une Revue Neurocomputing Année : 2018

Anomaly detection in smart card logs and distant evaluation with Twitter: a robust framework

(1) , (1) , (1) , (1)

Emeric Tonnelier

Fonction : Auteur

Machine Learning and Information Access

Nicolas Baskiotis

Fonction : Auteur
PersonId : 13841
IdHAL : baskiotisn
ORCID : 0000-0001-5015-0961
IdRef : 131488856

Machine Learning and Information Access

Vincent Guigue

Fonction : Auteur

Machine Learning and Information Access

Patrick Gallinari

Fonction : Auteur
PersonId : 751615
IdHAL : patrick-gallinari
ORCID : 0000-0001-9060-9001
IdRef : 070709076

Machine Learning and Information Access

Résumé

mart card logs constitute a valuable source of information to model a public transportation network and characterize normal or abnormal events; however, this source of data is associated to a high level of noise and missing data, thus, it requires robust analysis tools. First, we define an anomaly as any perturbation in the transportation network with respect to a typical day: temporary interruption, intermittent habit shifts, closed stations, unusual high/low number of entrances in a station. The Parisian metro network with 300 stations and millions of daily trips is considered as a case study. In this paper, we present four approaches for the task of anomaly detection in a transportation network using smart card logs. The first three approaches involve the inference of a daily temporal prototype of each metro station and the use of a distance denoting the compatibility of a particular day and its inferred prototype. We introduce two simple and strong baselines relying on a differential modeling between stations and prototypes in the raw-log space. We implemented a raw version (sensitive to volume change) as well as a normalized version (sensitive to behavior changes). The third approach is an original matrix factorization algorithm that computes a dictionary of typical behaviors shared across stations and the corresponding weights allowing the reconstruction of denoised station profiles. We propose to measure the distance between stations and prototypes directly in the latent space. The main advantage resides in its compactness allowing to describe each station profile and the inherent variability within a few parameters. The last approach is a user-based model in which abnormal behaviors are first detected for each user at the log level and then aggregated spatially and temporally; as a consequence, this approach is heavier and requires to follow users, at the opposite of the previous ones that operate on anonymous log data. On top of that, our contribution regards the evaluation framework: we listed particular days but we also mined RATP Twitter account to obtain (partial) ground truth information about operating incidents. Experiments show that matrix factorization is very robust in various situations while the last user-based model is particularly efficient to detect small incidents reported in the twitter dataset.

Domaines

Intelligence artificielle [cs.AI]

Vincent Guigue : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02503474

Soumis le : mardi 10 mars 2020-09:03:28

Dernière modification le : samedi 7 octobre 2023-21:36:22

Dates et versions

hal-02503474 , version 1 (10-03-2020)

Identifiants

HAL Id : hal-02503474 , version 1
DOI : 10.1016/j.neucom.2017.12.067

Citer

Emeric Tonnelier, Nicolas Baskiotis, Vincent Guigue, Patrick Gallinari. Anomaly detection in smart card logs and distant evaluation with Twitter: a robust framework. Neurocomputing, 2018, 298, pp.109-121. ⟨10.1016/j.neucom.2017.12.067⟩. ⟨hal-02503474⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS LIP6 SORBONNE-UNIVERSITE SU-SCIENCES

27 Consultations

0 Téléchargements

Anomaly detection in smart card logs and distant evaluation with Twitter: a robust framework

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager