A note on replacing uniform subsampling by random projections in MCMC for linear regression of tall datasets

Rémi Bardenet 1, * Odalric-Ambrym Maillard 2, *
* Auteur correspondant
2 TAO - Machine Learning and Optimisation
LRI - Laboratoire de Recherche en Informatique, UP11 - Université Paris-Sud - Paris 11, Inria Saclay - Ile de France, CNRS - Centre National de la Recherche Scientifique : UMR8623
Abstract : New Markov chain Monte Carlo (MCMC) methods have been proposed to tackle inference with tall datasets, i.e., when the number n of data items is intractably large. A large class of these new MCMC methods is based on randomly subsampling the dataset at each MCMC iteration. We investigate whether random projections can replace this random subsampling for linear regression of big streaming data. In the latter setting, random projections have indeed become standard for non-Bayesian treatments. We isolate two issues for MCMC to apply to streaming regression: 1) a resampling issue; MCMC should access the same random projections across iterations to avoid keeping the whole dataset in memory and 2) a budget issue; making individual MCMC acceptance decisions should require o(n) random projections. While the resampling issue can be satisfyingly tackled, current techniques in random projections and MCMC for tall data do not solve the budget issue, and may well end up showing it is not possible.
Type de document :
Pré-publication, Document de travail
2015
Liste complète des métadonnées

Littérature citée [36 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01248841
Contributeur : Rémi Bardenet <>
Soumis le : mardi 29 décembre 2015 - 12:44:04
Dernière modification le : samedi 18 février 2017 - 01:20:28

Fichier

arxiv.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01248841, version 1

Citation

Rémi Bardenet, Odalric-Ambrym Maillard. A note on replacing uniform subsampling by random projections in MCMC for linear regression of tall datasets. 2015. 〈hal-01248841〉

Partager

Métriques

Consultations de
la notice

454

Téléchargements du document

331