DiagNet: towards a generic, Internet-scale root cause analysis solution - Archive ouverte HAL Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2020

DiagNet: towards a generic, Internet-scale root cause analysis solution

Résumé

Diagnosing problems in Internet-scale services remains particularly difficult and costly for both content providers and ISPs. Because the Internet is decentralized, the cause of such problems might lie anywhere between an end-user's device and the service datacenters. Further, the set of possible problems and causes is not known in advance, making it impossible in practice to train a classifier with all combinations of problems, causes and locations. In this paper, we explore how different machine learning techniques can be used for Internet-scale root cause analysis using measurements taken from end-user devices. We show how to build generic models that (i) are agnostic to the underlying network topology, (ii) do not require to define the full set of possible causes during training, and (iii) can be quickly adapted to diagnose new services. Our solution, DiagNet, adapts concepts from image processing research to handle network and system metrics. We evaluate DiagNet with a multi-cloud deployment of online services with injected faults and emulated clients with automated browsers. We demonstrate promising root cause analysis capabilities, with a recall of 73.9% including causes only being introduced at inference time.
Fichier principal
Vignette du fichier
diagnet.pdf (405.02 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-02534888 , version 1 (07-04-2020)
hal-02534888 , version 2 (18-05-2021)

Identifiants

Citer

Loïck Bonniot, Christoph Neumann, François Taïani. DiagNet: towards a generic, Internet-scale root cause analysis solution. 2020. ⟨hal-02534888v1⟩
186 Consultations
197 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More