Noisy Quantization: theory and practice

Abstract : The effect of errors in variables in quantization is investigated. Given a noisy sample $Z_i=X_i+\epsilon_i,i=1,\ldots,n$, where $(X_i)_{i=1, \ldots ,n}$ are i.i.d. with law $P$, we want to find the best approximation of the probability distribution $P$ with $k\geq 1$ points called codepoints. We prove general excess risk bounds with fast rates for an empirical minimization based on a deconvolution kernel estimator. These rates depend on the behaviour of the density of $P$ and the asymptotic behaviour of the characteristic function of the noise $\epsilon$. This general study can be applied to the problem of $k$-means clustering with noisy data. For this purpose, we introduce a deconvolution $k$-means stochastic minimization which reaches fast rates of convergence under standard Pollard's regularity assumptions. We also introduce a new algorithm to deal with $k$-means clustering with errors in variables. Following the theoretical study, the algorithm mixes different tools from the inverse problem literature and the machine learning community. Coarsely, it is based on a two-step procedure: (1) a deconvolution step to deal with noisy inputs and (2) Newton's iterations as the popular $k$-means.
Document type :
Preprints, Working Papers, ...
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01060380
Contributor : Sébastien Loustau <>
Submitted on : Wednesday, September 10, 2014 - 3:35:40 PM
Last modification on : Friday, May 10, 2019 - 12:14:02 PM
Long-term archiving on : Thursday, December 11, 2014 - 11:45:20 AM

File

jmva.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01060380, version 1

Collections

Citation

Camille Brunet, Sébastien Loustau. Noisy Quantization: theory and practice. 2014. ⟨hal-01060380⟩

Share

Metrics

Record views

369

Files downloads

116