Skip to Main content Skip to Navigation
Conference papers

Mining Top-K Largest Tiles in a Data Stream

Abstract : Large tiles in a database are itemsets with the largest area which is defined as the itemset frequency in the database multiplied by its size. Mining these large tiles is an important pattern mining problem since tiles with a large area describe a large part of the database. In this paper, we introduce the problem of mining top-k largest tiles in a data stream under the sliding window model. We propose a candidate-based approach which summarizes the data stream and produces the top-k largest tiles efficiently for moderate window size. We also propose an approximation algorithm with theoretical bounds on the error rate to cope with large size windows. In the experiments with two real-life datasets, the approximation algorithm is up to hundred times faster than the candidate-based solution and the baseline algorithms based on the state-of-the-art solutions. We also investigate an application of large tile mining in computer vision and in emerging search topics monitoring
Complete list of metadata

Cited literature [16 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01011374
Contributor : Baptiste Jeudy Connect in order to contact the contributor
Submitted on : Tuesday, December 20, 2016 - 11:33:23 AM
Last modification on : Thursday, March 18, 2021 - 10:18:02 AM
Long-term archiving on: : Tuesday, March 21, 2017 - 10:38:24 AM

File

tile.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Hoang Thanh Lam, Wenjie Pei, Adriana Prado, Baptiste Jeudy, Elisa Fromont. Mining Top-K Largest Tiles in a Data Stream. The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2014), Sep 2014, Nancy, France. pp.82-97, ⟨10.1007/978-3-662-44851-9_6⟩. ⟨hal-01011374⟩

Share

Metrics

Record views

364

Files downloads

539