Learning Linear Regression Models over Factorized Joins - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2016

Learning Linear Regression Models over Factorized Joins

Maximilian Schleich
  • Fonction : Auteur
  • PersonId : 983426
Dan Olteanu
  • Fonction : Auteur
  • PersonId : 983427
Radu Ciucanu

Résumé

We investigate the problem of building least squares regression models over training datasets defined by arbitrary join queries on database tables. Our key observation is that joins entail a high degree of redundancy in both computation and data representation, which is not required for the end-to-end solution to learning over joins. We propose a new paradigm for computing batch gradient descent that exploits the factorized computation and representation of the training datasets, a rewriting of the regression objective function that decouples the computation of cofactors of model parameters from their convergence, and the commutativity of cofactor computation with relational union and projection. We introduce three flavors of this approach: F/FDB computes the cofactors in one pass over the materialized factorized join; F avoids this materialization and intermixes cofactor and join computation; F/SQL expresses this mixture as one SQL query. Our approach has the complexity of join factorization, which can be exponentially lower than of standard joins. Experiments with commercial, public, and synthetic datasets show that it outperforms MADlib, Python StatsModels, and R, by up to three orders of magnitude.
Fichier non déposé

Dates et versions

hal-01330113 , version 1 (10-06-2016)

Identifiants

  • HAL Id : hal-01330113 , version 1

Citer

Maximilian Schleich, Dan Olteanu, Radu Ciucanu. Learning Linear Regression Models over Factorized Joins. ACM SIGMOD, Jun 2016, San Francisco, United States. ⟨hal-01330113⟩
148 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More