Distributed Evaluation of Top-k Temporal Joins

Abstract : We study a particular kind of join, coined Ranked Temporal Join (RTJ), featuring predicates that compare time intervals and a scoring function associated with each predicate to quantify how well it is satisfied. RTJ queries are prevalent in a variety of applications such as network traffic monitoring , task scheduling, and tweet analysis. RTJ queries are often best interpreted as top-k queries where only the best matches are returned. We show how to exploit the nature of temporal predicates and the properties of their associated scoring semantics to design TKIJ , an efficient query evaluation approach on a distributed Map-Reduce architecture. TKIJ relies on an offline statistics computation that, given a time partitioning into granules, computes the distribution of intervals' endpoints in each granule, and an online computation that generates query-dependent score bounds. Those statistics are used for workload assignment to reducers. This aims at reducing data replication, to limit I/O cost. Additionally , high-scoring results are distributed evenly to enable each reducer to prune unnecessary results. Our extensive experiments on synthetic and real datasets show that TKIJ outperforms state-of-the-art competitors and provides very good performance for n-ary RTJ queries on temporal data.
Liste complète des métadonnées

Littérature citée [27 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01266188
Contributeur : Julien Pilourdault <>
Soumis le : mercredi 3 février 2016 - 17:15:26
Dernière modification le : jeudi 11 octobre 2018 - 08:48:05

Fichier

mod368-pilourdault.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Julien Pilourdault, Vincent Leroy, Sihem Amer-Yahia. Distributed Evaluation of Top-k Temporal Joins. To appear in SIGMOD'16. 2016. 〈hal-01266188v2〉

Partager

Métriques

Consultations de la notice

709

Téléchargements de fichiers

524