FixMe: A Self-organizing Isolated Anomaly Detection Architecture for Large Scale Distributed Systems

Emmanuelle Anceaume 1, 2 Romaric Ludinard 3 Bruno Sericola 3 Erwan Le Merrer 4 Gilles Straub 4
1 CIDER
IRISA-D1 - SYSTÈMES LARGE ÉCHELLE
2 CIDRE - Confidentialité, Intégrité, Disponibilité et Répartition
IRISA-D1 - SYSTÈMES LARGE ÉCHELLE, Inria Rennes – Bretagne Atlantique , CentraleSupélec
3 DIONYSOS - Dependability Interoperability and perfOrmance aNalYsiS Of networkS
Inria Rennes – Bretagne Atlantique , IRISA-D2 - RÉSEAUX, TÉLÉCOMMUNICATION ET SERVICES
Abstract : Monitoring a system is the ability of collecting and analyzing relevant information provided by the monitored devices so as to be continuously aware of the system state. However, the ever growing complexity and scale of systems makes both real time monitoring and fault detection a quite tedious task. Thus the usually adopted option is to focus solely on a subset of information states, so as to provide coarse-grained indicators. As a consequence, detecting isolated failures or anomalies is a quite challenging issue. In this work, we propose to address this issue by pushing the monitoring task at the edge of the network. We present a peer-to-peer based architecture, which enables nodes to adaptively and efficiently self-organize according to their ''health'' indicators. By exploiting both temporal and spatial correlations that exist between a device and its vicinity, our approach guarantees that only isolated anomalies (an anomaly is isolated if it impacts solely a monitored device) are reported on the fly to the network operator. We show that the end-to-end detection process, \emph{i.e.}, from the local detection to the management operator reporting, requires a logarithmic number of messages in the size of the network.
Document type :
Conference papers
Complete list of metadatas

Cited literature [15 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-00736922
Contributor : Emmanuelle Anceaume <>
Submitted on : Sunday, September 30, 2012 - 9:19:09 PM
Last modification on : Friday, November 16, 2018 - 1:39:19 AM
Long-term archiving on : Monday, December 31, 2012 - 3:50:31 AM

File

opodis.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00736922, version 1

Citation

Emmanuelle Anceaume, Romaric Ludinard, Bruno Sericola, Erwan Le Merrer, Gilles Straub. FixMe: A Self-organizing Isolated Anomaly Detection Architecture for Large Scale Distributed Systems. Proceedings of the 16th International Conference On Principles Of Distributed Systems (OPODIS), Dec 2012, Rome, Italy. pp.12. ⟨hal-00736922⟩

Share

Metrics

Record views

1550

Files downloads

316