A generic framework for efficient computation of top-k diverse results - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue The VLDB Journal Année : 2023

A generic framework for efficient computation of top-k diverse results

Résumé

Result diversification is extensively studied in the context of search, recommendation, and data exploration. There are numerous algorithms that return top-k results that are both diverse and relevant. These algorithms typically have computational loops that compare the pairwise diversity of records to decide which ones to retain. We propose an access primitive DivGetBatch() that replaces repeated pairwise comparisons of diversity scores of records by pairwise comparisons of “aggregate” diversity scores of a group of records, thereby improving the running time of these algorithms while preserving the same results. We integrate the access primitive inside three representative diversity algorithms and prove that the augmented algorithms leveraging the access primitive preserve original results. We analyze the worst and expected case running times of these algorithms. We propose a computational framework to design this access primitive that has a pre-computed index structure I-tree that is agnostic to the specific details of diversity algorithms. We develop principled solutions to construct and maintain I-tree. Our experiments on multiple large real-world datasets corroborate our theoretical findings, while ensuring up to a 24x speedup.
Fichier principal
Vignette du fichier
main2.pdf (706.6 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04239842 , version 1 (14-10-2023)

Identifiants

Citer

Md Mouinul Islam, Mahsa Asadi, Sihem Amer-Yahia, Senjuti Basu Roy. A generic framework for efficient computation of top-k diverse results. The VLDB Journal, 2023, 32 (4), pp.737-761. ⟨10.1007/s00778-022-00770-0⟩. ⟨hal-04239842⟩
11 Consultations
42 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More