Noisy Quantization: theory and practice

Abstract : The effect of errors in variables in quantization is investigated. Given a noisy sample $Z_i=X_i+\epsilon_i,i=1,\ldots,n$, where $(X_i)_{i=1, \ldots ,n}$ are i.i.d. with law $P$, we want to find the best approximation of the probability distribution $P$ with $k\geq 1$ points called codepoints. We prove general excess risk bounds with fast rates for an empirical minimization based on a deconvolution kernel estimator. These rates depend on the behaviour of the density of $P$ and the asymptotic behaviour of the characteristic function of the noise $\epsilon$. This general study can be applied to the problem of $k$-means clustering with noisy data. For this purpose, we introduce a deconvolution $k$-means stochastic minimization which reaches fast rates of convergence under standard Pollard's regularity assumptions. We also introduce a new algorithm to deal with $k$-means clustering with errors in variables. Following the theoretical study, the algorithm mixes different tools from the inverse problem literature and the machine learning community. Coarsely, it is based on a two-step procedure: (1) a deconvolution step to deal with noisy inputs and (2) Newton's iterations as the popular $k$-means.
Type de document :
Pré-publication, Document de travail
2014
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01060380
Contributeur : Sébastien Loustau <>
Soumis le : mercredi 10 septembre 2014 - 15:35:40
Dernière modification le : lundi 5 février 2018 - 15:00:03
Document(s) archivé(s) le : jeudi 11 décembre 2014 - 11:45:20

Fichier

jmva.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01060380, version 1

Collections

Citation

Camille Brunet, Sébastien Loustau. Noisy Quantization: theory and practice. 2014. 〈hal-01060380〉

Partager

Métriques

Consultations de la notice

362

Téléchargements de fichiers

112