Skip to Main content Skip to Navigation
Conference papers

Anytime Subgroup Discovery in High Dimensional Numerical Data

Romain Mathonat 1, 2 Diana Nurbakova 3 Jean-François Boulicaut 2 Mehdi Kaytoue 1, 2 
2 DM2L - Data Mining and Machine Learning
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information
3 DRIM - Distribution, Recherche d'Information et Mobilité
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information
Abstract : Subgroup discovery (SD) enables one to elicit patterns that strongly discriminate a class label. When it comes to numerical data, most of the existing SD approaches perform data discretizations and thus suffer from information loss. A few algorithms avoid such a loss by considering the search space of every interval pattern built on the dataset numerical values and provide an "anytime" property: at any moment, they are able to provide a result that improves over time. Given a sufficient time/memory budget, they may eventually complete an exhaustive search. However, such approaches are often intractable when dealing with high-dimensional numerical data, for instance, when extracting features from real-life multivariate time series. To overcome such limitations, we propose MonteCloPi, an approach based on a bottom-up exploration of numerical patterns with a Monte Carlo Tree Search. It enables to have a better exploration-exploitation trade-off between exploration and exploitation when sampling huge search spaces. Our extensive set of experiments proves the efficiency of MonteCloPi on highdimensional data with hundreds of attributes. We finally discuss the actionability of discovered subgroups when looking for skill analysis from Rocket League action logs.
Document type :
Conference papers
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03318017
Contributor : Romain MATHONAT Connect in order to contact the contributor
Submitted on : Monday, August 9, 2021 - 10:18:04 AM
Last modification on : Monday, October 18, 2021 - 9:20:21 AM
Long-term archiving on: : Wednesday, November 10, 2021 - 6:22:00 PM

Identifiers

  • HAL Id : hal-03318017, version 1

Citation

Romain Mathonat, Diana Nurbakova, Jean-François Boulicaut, Mehdi Kaytoue. Anytime Subgroup Discovery in High Dimensional Numerical Data. IEEE International Conference on Data Science and Advanced Analytics (DSAA), Oct 2021, Porto, Portugal. ⟨hal-03318017⟩

Share

Metrics

Record views

48

Files downloads

78