Dependency Weighted Aggregation on Factorized Databases

Florent Capelli 1, 2 Nicolas Crosetti 1 Joachim Niehren 1 Jan Ramon 3
1 LINKS - Linking Dynamic Data
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
3 MAGNET - Machine Learning in Information Networks
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
Abstract : We study a new class of aggregation problems, called dependency weighted aggregation. The underlying idea is to aggregate the answer tuples of a query while accounting for dependencies between them, where two tuples are considered dependent when they have the same value on some attribute. The main problem we are interested in is to compute the dependency weighted count of a conjunctive query. This aggregate can be seen as a form of weighted counting, where the weights of the answer tuples are computed by solving a linear program. This linear program enforces that dependent tuples are not over represented in the final weighted count. The dependency weighted count can be used to compute the s-measure, a measure that is used in data mining to estimate the frequency of a pattern in a graph database. Computing the dependency weighted count of a conjunctive query is NP-hard in general. In this paper, we show that this problem is actually tractable for a large class of structurally restricted conjunctive queries such as acyclic or bounded hypertree width queries. Our algorithm works on a factorized representation of the answer set, in order to avoid enumerating it exhaustively. Our technique produces a succinct representation of the weighting of the answers. It can be used to solve other dependency weighted aggregation tasks, such as computing the (dependency) weighted average of the value of an attribute in the answers set.
Document type :
Preprints, Working Papers, ...
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01981553
Contributor : Inria Links <>
Submitted on : Tuesday, January 15, 2019 - 10:34:43 AM
Last modification on : Friday, March 22, 2019 - 1:34:12 AM

Links full text

Identifiers

  • HAL Id : hal-01981553, version 1
  • ARXIV : 1901.03633

Citation

Florent Capelli, Nicolas Crosetti, Joachim Niehren, Jan Ramon. Dependency Weighted Aggregation on Factorized Databases. 2019. ⟨hal-01981553⟩

Share

Metrics

Record views

45