Sequence-RTG: Efficient and Production-Ready Pattern Mining in System Log Messages - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2021

Sequence-RTG: Efficient and Production-Ready Pattern Mining in System Log Messages

Résumé

System logs are a wealth of information that can be leveraged to control the behaviour of a computing and storage infrastructure, detect deviations from normal behaviour, and react accordingly by triggering some predefined actions. System log management usually consists of a complex workflow that collects, standardises, indexes, stores, and visualises the log messages to help system administration teams in their daily operations. In large scale data centres such log management infrastructures can collect millions if not billions of messages per day. A key component in this workflow is the identification of message patterns, which requests the expertise of administrators. These patterns represent a template of both static and variable message parts against which a new log message can be matched. This crucial task is often done manually, but these patterns can change frequently making it time consuming for the human operators to keep up. Therefore, we propose in this paper to automate the discovery of patterns in system log messages by extending the functionalities of an existing pattern mining framework, called Sequence. Our main objectives are to improve both the scalability of this framework and its capacity to be integrated into a complete system log management workflow. We present how we addressed six main limitations of the seminal Sequence tool. These modifications led us to propose Sequence-RTG (Sequence-Ready-To-Go), a more efficient and production-ready version. We analyse its performance in terms of both speed, using data-sets of increasing sizes, and accuracy on data-sets from the literature. We also show that two months after the introduction of Sequence-RTG within the system log management framework of the IN2P3 Computing Centre we reduced the fraction of messages that are not matched to a pattern from 75-80% to only 15%.
Fichier principal
Vignette du fichier
HPCMASPA_submitted.pdf (3.18 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03329605 , version 1 (31-08-2021)
hal-03329605 , version 2 (09-09-2021)

Identifiants

  • HAL Id : hal-03329605 , version 1

Citer

Louise Harding, Fabien Wernli, Frédéric Suter. Sequence-RTG: Efficient and Production-Ready Pattern Mining in System Log Messages. 8th Workshop on Monitoring and Analysis for High Performance Computing Systems Plus Applications (HPCMASPA), Sep 2021, Portland (virtual), United States. ⟨hal-03329605v1⟩
83 Consultations
77 Téléchargements

Partager

Gmail Facebook X LinkedIn More