HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

A note on replacing uniform subsampling by random projections in MCMC for linear regression of tall datasets

Rémi Bardenet 1, * Odalric-Ambrym Maillard 2, *
* Corresponding author
2 TAO - Machine Learning and Optimisation
CNRS - Centre National de la Recherche Scientifique : UMR8623, Inria Saclay - Ile de France, UP11 - Université Paris-Sud - Paris 11, LRI - Laboratoire de Recherche en Informatique
Abstract : New Markov chain Monte Carlo (MCMC) methods have been proposed to tackle inference with tall datasets, i.e., when the number n of data items is intractably large. A large class of these new MCMC methods is based on randomly subsampling the dataset at each MCMC iteration. We investigate whether random projections can replace this random subsampling for linear regression of big streaming data. In the latter setting, random projections have indeed become standard for non-Bayesian treatments. We isolate two issues for MCMC to apply to streaming regression: 1) a resampling issue; MCMC should access the same random projections across iterations to avoid keeping the whole dataset in memory and 2) a budget issue; making individual MCMC acceptance decisions should require o(n) random projections. While the resampling issue can be satisfyingly tackled, current techniques in random projections and MCMC for tall data do not solve the budget issue, and may well end up showing it is not possible.
Document type :
Preprints, Working Papers, ...
Complete list of metadata

Cited literature [36 references]  Display  Hide  Download

Contributor : Rémi Bardenet Connect in order to contact the contributor
Submitted on : Tuesday, December 29, 2015 - 12:44:04 PM
Last modification on : Wednesday, March 23, 2022 - 3:51:16 PM


Files produced by the author(s)


  • HAL Id : hal-01248841, version 1


Rémi Bardenet, Odalric-Ambrym Maillard. A note on replacing uniform subsampling by random projections in MCMC for linear regression of tall datasets. 2015. ⟨hal-01248841⟩



Record views


Files downloads