A Lower Bound on the Sample Size needed to perform a Significant Frequent Pattern Mining Task - Archive ouverte HAL Access content directly
Journal Articles Pattern Recognition Letters Year : 2009

A Lower Bound on the Sample Size needed to perform a Significant Frequent Pattern Mining Task

François Jacquenet
Marc Sebban

Abstract

During the past few years, the problem of assessing the statistical significance of frequent patterns extracted from a given set S of data has received much attention. Considering that S always consists of a sample drawn from an unknown underlying distribution, two types of risks can arise during a frequent pattern mining process: accepting a false frequent pattern or rejecting a true one. In this context, many approaches presented in the literature assume that the dataset size is an application-dependent parameter. In this case, there is a trade-off between both errors leading to solutions that only control one risk to the detriment of the other one. On the other hand, many sampling-based methods have attempted to determine the optimal size of S ensuring a good approximation of the original (potentially infinite) database from which S is drawn. However, these approaches often resort to Chernoff bounds that do not allow the independent control of the two risks. In this paper, we overcome the mentioned drawbacks by providing a lower bound on the sample size required to control both risks and achieve a significant frequent pattern mining task.
Fichier principal
Vignette du fichier
PRL09.pdf (322.91 Ko) Télécharger le fichier
Origin : Files produced by the author(s)

Dates and versions

hal-00381667 , version 1 (29-07-2009)
hal-00381667 , version 2 (26-08-2014)

Identifiers

Cite

Stéphanie Jacquemont, François Jacquenet, Marc Sebban. A Lower Bound on the Sample Size needed to perform a Significant Frequent Pattern Mining Task. Pattern Recognition Letters, 2009, 30 (2009), pp.960-967. ⟨10.1016/j.patrec.2009.05.002⟩. ⟨hal-00381667v2⟩
198 View
123 Download

Altmetric

Share

Gmail Facebook X LinkedIn More