TopPI: An Efficient Algorithm for Item-Centric Mining

Abstract : We introduce TopPI, a new semantics and algorithm designed to mine long-tailed datasets. For each item, and regardless of its frequency, TopPI finds the k most frequent closed itemsets that item belongs to. For example, in our retail dataset, TopPI finds the itemset " nori seaweed, wasabi, sushi rice, soy sauce " that occurrs in only 133 store receipts out of 290 million. It also finds the itemset " milk, puff pastry " , that appears 152,991 times. Thanks to a dynamic threshold adjustment and an adequate pruning strategy, TopPI efficiently traverses the relevant parts of the search space and can be parallelized on multi-cores. Our experiments on datasets with different characteristics show the high performance of TopPI and its superiority when compared to state-of-the-art mining algorithms. We show experimentally on real datasets that TopPI allows the analyst to explore and discover valuable itemsets.
Complete list of metadatas

Cited literature [14 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01354713
Contributor : Vincent Leroy <>
Submitted on : Friday, August 19, 2016 - 11:54:29 AM
Last modification on : Friday, October 25, 2019 - 1:30:38 AM
Long-term archiving on : Sunday, November 20, 2016 - 10:22:44 AM

File

toppi.pdf
Files produced by the author(s)

Identifiers

Citation

Martin Kirchgessner, Vincent Leroy, Alexandre Termier, Sihem Amer-Yahia, Marie-Christine Rousset. TopPI: An Efficient Algorithm for Item-Centric Mining. 18th International Conference on Big Data Analytics and Knowledge Discovery, Sep 2016, Porto, Portugal. ⟨10.1007/978-3-319-43946-4_2⟩. ⟨hal-01354713⟩

Share

Metrics

Record views

950

Files downloads

372