A new constraint for mining sets in sequences - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2009

A new constraint for mining sets in sequences

Résumé

Discovering interesting patterns in event sequences is a popular task in the field of data mining. Most existing methods try to do this based on some measure of cohesion to determine an occurrence of a pattern, and a frequency threshold to determine if the pattern occurs often enough. We introduce a new constraint based on a new interestingness measure combining the cohesion and the frequency of a pattern. For a dataset consisting of a single sequence, the cohesion is measured as the average length of the smallest intervals containing the pattern for each occurrence of its events, and the frequency is measured as the probability of observing an event of that pattern. We present a similar constraint for datasets consisting of multiple sequences. We present algorithms to efficiently identify the thus defined interesting patterns, given a dataset and a user-defined threshold. After applying our method to both synthetic and real-life data, we conclude that it indeed gives intuitive results in a number of applications.
Fichier non déposé

Dates et versions

hal-01437649 , version 1 (17-01-2017)

Identifiants

Citer

Boris Cule, Bart Goethals, Céline Robardet. A new constraint for mining sets in sequences. Proc. SIAM Int. Conf. on Data Mining SDM'09, Apr 2009, Sparks, Nevada, United States. pp.317-328, ⟨10.1137/1.9781611972795.28⟩. ⟨hal-01437649⟩
89 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More