Towards easy prototyping of pattern mining problems

Abstract : The discovery of frequent patterns is a famous problem in data mining. While plenty of algorithms have been pro- posed during the last decade, only a few contributions have tried to understand the influence of datasets on the algo- rithms behavior. Being able to explain why certain algo- rithms are likely to perform very well or very poorly on some datasets is still an open question. In this setting, we describe a thorough experimental study of datasets with respect to frequent itemsets. We study the distribution of frequent itemsets with respect to itemsets size together with the distribution of three concise represen- tations: frequent closed, frequent free and frequent essential itemsets. For each of them, we also study the distribution of their positive and negative borders whenever possible. From this analysis, we exhibit a new characterization of datasets and some invariants allowing to better predict the behavior of well known algorithms. The main perspective of this work is to devise adaptive algorithms with respect to dataset characteristics.The discovery of frequent patterns is a famous problem in data mining. While plenty of algorithms have been pro- posed during the last decade, only a few contributions have tried to understand the influence of datasets on the algo- rithms behavior. Being able to explain why certain algo- rithms are likely to perform very well or very poorly on some datasets is still an open question. In this setting, we describe a thorough experimental study of datasets with respect to frequent itemsets. We study the distribution of frequent itemsets with respect to itemsets size together with the distribution of three concise represen- tations: frequent closed, frequent free and frequent essential itemsets. For each of them, we also study the distribution of their positive and negative borders whenever possible. From this analysis, we exhibit a new characterization of datasets and some invariants allowing to better predict the behavior of well known algorithms. The main perspective of this work is to devise adaptive algorithms with respect to dataset characteristics.
Document type :
Conference papers
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01502081
Contributor : Équipe Gestionnaire Des Publications Si Liris <>
Submitted on : Wednesday, April 5, 2017 - 9:15:18 AM
Last modification on : Friday, January 11, 2019 - 4:54:11 PM

Identifiers

  • HAL Id : hal-01502081, version 1

Citation

Frédéric Flouvat, Fabien de Marchi, Jean-Marc Petit. Towards easy prototyping of pattern mining problems. Colloque sur l'Optimisation et les Systèmes d'Information (COSI'07), Jun 2007, Oran, Algeria. pp.611-623. ⟨hal-01502081⟩

Share

Metrics

Record views

128