D. Arthur and S. Vassilvitskii, k-means++: the advantages of careful seeding, Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp.1027-1035, 2007.

A. Banerjee, X. Guo, and H. Wang, On the optimality of conditional expectation as a Bregman predictor, IEEE Transactions on Information Theory, vol.51, 2005.

A. Banerjee, Clustering with Bregman divergences, Journal of Machine Learning Research, vol.6, pp.1705-1749, 2005.

G. Biau, L. Devroye, and G. Lugosi, On the performance of clustering in Hilbert spaces, IEEE Trans. Inform. Theory, vol.54, pp.18-9448, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00290855

S. Boucheron, O. Bousquet, and G. Lugosi, Theory of classification: a survey of some recent advances, ESAIM Probab. Stat, vol.9, pp.1292-8100, 2005.
URL : https://hal.archives-ouvertes.fr/hal-00017923

S. Boucheron, G. Lugosi, and P. Massart, Concentration inequalities. A nonasymptotic theory of independence, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00777381

L. M. Bregman, The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming, USSR Computational Mathematics and Mathematical Physics, vol.7, pp.200-217, 1967.

C. Brownlees, E. Joly, and G. Lugosi, Empirical risk minimization for heavy-tailed losses, Ann. Statist, vol.43, pp.90-5364, 2015.

H. Cardot, P. Cenac, and P. Zitt, Efficient and fast estimation of the geometric median in Hilbert spaces with an averaged stochastic gradient algorithm, In: Bernoulli, vol.19, pp.18-43, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00558481

O. Catoni and I. Giulini, Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector, 2018.

N. Cesa-bianchi and G. Lugosi, Prediction, Learning, and Games, 2006.

F. Chazal, D. Cohen-steiner, and Q. Mérigot, Geometric Inference for Measures based on Distance Functions, Foundations of Computational Mathematics, vol.11, pp.733-751, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00383685

F. Chazal, D. Cohen-steiner, and Q. Mérigot, Geometric Inference for Probability Measures, Foundations of Computational Mathematics archive, vol.11, pp.733-751, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00772444

F. Chazal, Persistence-based clustering in Riemannian manifolds, J. ACM, vol.60, 2013.
URL : https://hal.archives-ouvertes.fr/inria-00389390

R. Coe and R. D. Stern, Fitting Models to Daily Rainfall Data, Journal of Applied Meteorology, vol.21, pp.1024-1031, 1982.

J. A. Cuesta-albertos, A. Gordaliza, and C. Matrán, Trimmed k-means: an attempt to robustify quantizers, Ann. Statist, vol.25, pp.553-576, 1997.

D. Donoho and P. J. Huber, The notion of breakdown point, pp.157-184, 1983.

R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2000.

A. Luis and . Garcí-a-escudero, A general trimming approach to robust cluster analysis, Ann. Statist, vol.36, issue.3, pp.90-5364, 2008.

S. Evert, A simple LNRE model for random character sequences, Proceedings of the 7èmes Journées Internationales d'Analyse Statistique des Données Textuelles, pp.411-422, 2004.

A. Fischer, Quantization and clustering with Bregman divergences, J. Multivariate Anal, vol.101, pp.47-259, 2010.

H. Fritz, L. A. Garcia-escudero, and A. Mayo-iscar, tclust: An R Package for a Trimming Approach to Cluster Analysis, Journal of Statistical Software, vol.47, pp.1-26, 2012.

A. Gordaliza, Best approximations to random variables based on trimming procedures, J. Approx. Theory, vol.64, pp.162-180, 1991.

R. M. Gray, Distortion measures for speech processing, IEEE Transactions on Acoustics, Speech and Signal Processing, vol.28, pp.367-376, 1980.

M. Hahsler, M. Piekenbrock, and D. Doran, dbscan: Fast Density-Based Clustering with R, Journal of Statistical Software, vol.91, pp.1-30, 2019.

A. Kumar and R. Kannan, Clustering with spectral norm and the k-means algorithm, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science-FOCS 2010, pp.299-308, 2010.

G. Lecué and M. Lerasle, Robust machine learning by median-of-means : theory and practice, 2017.

C. Levrard, Nonasymptotic bounds for vector quantization in Hilbert spaces, Ann. Statist, vol.43, issue.2, pp.592-619, 2015.

C. Levrard, Quantization/Clustering: when and why does k-means work?, In: JSFdS 159, vol.1, pp.1-26, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01667014

T. Linder, Learning-theoretic methods in vector quantization, Principles of nonparametric learning, vol.434, pp.163-210, 2001.

S. P. Lloyd, Least squares quantization in PCM, IEEE Transactions on Information Theory, vol.28, pp.129-137, 1982.

R. A. Maronna, R. D. Martin, and V. J. Yohai, Robust statistics. Wiley Series in Probability and Statistics. Theory and methods, 2006.

S. Mendelson and R. Vershynin, Entropy and the combinatorial dimension, In: Invent. Math, vol.152, pp.37-55, 2003.

F. Nielsen, J. D. Boissonnat, and R. Nock, Bregman Voronoi diagrams: properties, algorithms and applications, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00137865

R. T. Rockafellar, Convex Analysis, 1970.

A. Strehl and J. Ghosh, Cluster ensembles -A knowledge reuse framework for combining multiple partitions, Journal of Machine Learning Research, vol.3, pp.583-617, 2002.

C. Tang and C. Monteleoni, On Lloyd's Algorithm: New Theoretical Insights for Clustering in Practice, Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, vol.51, pp.1280-1289, 2016.

L. Arnold, Humanities Data in R: Exploring Networks, Geospatial Data, Images, and Text. 1st ed. 2015. Quantitative Methods in the Humanities and Social Sciences, 2015.

, 25) and c 3 = (40, 40), I 2 the identity matrix on R 2 and ? = (? 1 , ? 2 , ? 3 ). The first distribution L 1 corresponds to clusters with the same variance, with ? = (5, 5, 5), the second distribution L 2 to clusters with increasing variance, with ? = (1, 4, 7), and the third distribution L 3 to clusters with increasing and decreasing variance, vol.10