TopPI: An Efficient Algorithm for Item-Centric Mining - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2016

TopPI: An Efficient Algorithm for Item-Centric Mining

Résumé

We introduce TopPI, a new semantics and algorithm designed to mine long-tailed datasets. For each item, and regardless of its frequency, TopPI finds the k most frequent closed itemsets that item belongs to. For example, in our retail dataset, TopPI finds the itemset " nori seaweed, wasabi, sushi rice, soy sauce " that occurrs in only 133 store receipts out of 290 million. It also finds the itemset " milk, puff pastry " , that appears 152,991 times. Thanks to a dynamic threshold adjustment and an adequate pruning strategy, TopPI efficiently traverses the relevant parts of the search space and can be parallelized on multi-cores. Our experiments on datasets with different characteristics show the high performance of TopPI and its superiority when compared to state-of-the-art mining algorithms. We show experimentally on real datasets that TopPI allows the analyst to explore and discover valuable itemsets.
Fichier principal
Vignette du fichier
toppi.pdf (579.38 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01354713 , version 1 (19-08-2016)

Identifiants

Citer

Martin Kirchgessner, Vincent Leroy, Alexandre Termier, Sihem Amer-Yahia, Marie-Christine Rousset. TopPI: An Efficient Algorithm for Item-Centric Mining. 18th International Conference on Big Data Analytics and Knowledge Discovery, Sep 2016, Porto, Portugal. ⟨10.1007/978-3-319-43946-4_2⟩. ⟨hal-01354713⟩
376 Consultations
263 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More