Skip to Main content Skip to Navigation
Conference papers

Interactive anomaly detection in mixed tabular data using Bayesian networks

Abstract : The last decades improvements in processing abilities have quickly led to an increasing use of data analyses implying massive data-sets. To retrieve insightful information from any data driven approach, a pivotal aspect to ensure is good data quality. Manual correction of massive data-sets requires tremendous efforts, is prone to errors, and results being really costly. If knowledge in a specific field can often allow the development of efficient models for anomaly detection and data correction, this knowledge can sometimes be unavailable and a more generic approach should be found. This paper presents a novel approach to anomaly detection and correction in mixed tabular data using Bayesian Networks. We present an algorithm for detecting anomalies and offering correction hints based on Jensen scores computed within the Markov Blankets of considered variables. We also discuss the incremental corrections of detection model using user's feedback, as well as additional aspects related to discretization in mixed data and its effects on detection efficiency. Finally we also discuss how functional dependencies can be managed to detect errors while improving faithfulness and computation speed.
Document type :
Conference papers
Complete list of metadata
Contributor : Philippe Leray <>
Submitted on : Wednesday, March 17, 2021 - 1:53:05 PM
Last modification on : Friday, March 19, 2021 - 7:51:03 AM


Files produced by the author(s)


  • HAL Id : hal-03014622, version 1


Evan Dufraisse, Philippe Leray, Raphaël Nedellec, Tarek Benkhelif. Interactive anomaly detection in mixed tabular data using Bayesian networks. 10th International Conference on Probabilistic Graphical Models (PGM 2020), Sep 2020, Aalborg, Denmark. ⟨hal-03014622⟩



Record views


Files downloads