SeqScout: Using a Bandit Model to Discover Interesting Subgroups in Labeled Sequences

Romain Mathonat 1 Diana Nurbakova 2 Jean-François Boulicaut 1 Mehdi Kaytoue 1
1 DM2L - Data Mining and Machine Learning
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information
2 DRIM - Distribution, Recherche d'Information et Mobilité
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information
Abstract : It is extremely useful to exploit labeled datasets not only to learn models but also to improve our understanding of a domain and its available targeted classes. The so-called subgroup discovery task has been considered for a long time. It concerns the discovery of patterns or descriptions, the set of supporting objects of which have interesting properties, e.g., they characterize or discriminate a given target class. Though many subgroup discovery algorithms have been proposed for transactional data, discovering subgroups within labeled sequential data and thus searching for descriptions as sequential patterns has been much less studied. In that context, exhaustive exploration strategies can not be used for real-life applications and we have to look for heuristic approaches. We propose the algorithm SeqScout to discover interesting subgroups (w.r.t. a chosen quality measure) from labeled sequences of itemsets. This is a new sampling algorithm that mines discriminant sequential patterns using a multi-armed bandit model. It is an anytime algorithm that, for a given budget, finds a collection of local optima in the search space of descriptions and thus subgroups. It requires a light configuration and it is independent from the quality measure used for pattern scoring. Furthermore, it is fairly simple to implement. We provide qualitative and quantitative experiments on several datasets to illustrate its added-value.
Document type :
Conference papers
Complete list of metadatas

Cited literature [29 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-02282082
Contributor : Romain Mathonat <>
Submitted on : Monday, September 9, 2019 - 4:50:39 PM
Last modification on : Thursday, November 21, 2019 - 2:20:46 AM

File

PID6064315.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02282082, version 1

Citation

Romain Mathonat, Diana Nurbakova, Jean-François Boulicaut, Mehdi Kaytoue. SeqScout: Using a Bandit Model to Discover Interesting Subgroups in Labeled Sequences. IEEE International Conference on Data Science and Advanced Analytics (DSAA), Oct 2019, Washington, United States. ⟨hal-02282082⟩

Share

Metrics

Record views

85

Files downloads

69