Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

Quantization/clustering: when and why does k-means work?

Abstract : Though mostly used as a clustering algorithm, k-means are originally designed as a quantization algorithm. Namely, it aims at providing a compression of a probability distribution with k points. Building upon [21, 33], we try to investigate how and when these two approaches are compatible. Namely, we show that provided the sample distribution satisfies a margin like condition (in the sense of [27] for supervised learning), both the associated empirical risk minimizer and the output of Lloyd's algorithm provide almost optimal classification in certain cases (in the sense of [6]). Besides, we also show that they achieved fast and optimal convergence rates in terms of sample size and compression risk.
Document type :
Preprints, Working Papers, ...
Complete list of metadatas

Cited literature [36 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01667014
Contributor : Clément Levrard <>
Submitted on : Monday, January 29, 2018 - 2:11:17 PM
Last modification on : Friday, April 10, 2020 - 5:26:45 PM
Document(s) archivé(s) le : Friday, May 25, 2018 - 4:01:37 PM

Files

QuantizationandClusteringHAL.p...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01667014, version 2
  • ARXIV : 1801.03742

Citation

Clément Levrard. Quantization/clustering: when and why does k-means work?. 2018. ⟨hal-01667014v2⟩

Share

Metrics

Record views

147

Files downloads

214