Determinantal Point Processes for Coresets

Nicolas Tremblay 1 Simon Barthelmé 2 Pierre-Olivier Amblard 1
GIPSA-DIS - Département Images et Signal
GIPSA-DIS - Département Images et Signal, GIPSA-PSD - GIPSA Pôle Sciences des Données
Abstract : When one is faced with a dataset too large to be used all at once, an obvious solution is to retain only part of it. In practice this takes a wide variety of different forms, but among them " coresets " are especially appealing. A coreset is a (small) weighted sample of the original data that comes with a guarantee: that a cost function can be evaluated on the smaller set instead of the larger one, with low relative error. For some classes of problems, and via a careful choice of sampling distribution, iid random sampling has turned to be one of the most successful methods to build coresets efficiently. However, independent samples are sometimes overly redundant, and one could hope that enforcing diversity would lead to better performance. The difficulty lies in proving coreset properties in non-iid samples. We show that the coreset property holds for samples formed with determinantal point processes (DPP). DPPs are interesting because they are a rare example of repulsive point processes with tractable theoretical properties, enabling us to construct general coreset theorems. We apply our results to the k-means problem, and give empirical evidence of the superior performance of DPP samples over state of the art methods.
Complete list of metadatas

Cited literature [59 references]  Display  Hide  Download
Contributor : Nicolas Tremblay <>
Submitted on : Thursday, October 17, 2019 - 5:16:52 PM
Last modification on : Friday, February 7, 2020 - 1:47:58 AM


Files produced by the author(s)


  • HAL Id : hal-01741533, version 2
  • ARXIV : 1803.08700


Nicolas Tremblay, Simon Barthelmé, Pierre-Olivier Amblard. Determinantal Point Processes for Coresets. Journal of Machine Learning Research, Microtome Publishing, 2019. ⟨hal-01741533v2⟩



Record views


Files downloads