A. K. Jain, M. N. Murty, and P. J. Flynn, Data clustering: a review, ACM Computing Surveys, vol.31, issue.3, pp.264-323, 1999.
DOI : 10.1145/331499.331504

A. K. Jain, Data clustering: 50 years beyond K-means, Pattern Recognition Letters, vol.31, issue.8, pp.651-666, 2010.
DOI : 10.1016/j.patrec.2009.09.011

S. Basu, A. Banerjee, and R. J. Mooney, Semi-supervised clustering by seeding, pp.27-34, 2002.

J. B. Macqueen, Some methods for classification and analysis of multivariate observations, Proc. of the fifth Berkeley Symposium on Mathematical Statistics and Probability, pp.281-297, 1967.

J. Bezdek, Pattern Recognition With Fuzzy Objective Function Algorithms, 1981.
DOI : 10.1007/978-1-4757-0450-1

J. Ward, Hierarchical Grouping to Optimize an Objective Function, Journal of the American Statistical Association, vol.58, issue.301, pp.236-244, 1963.
DOI : 10.1007/BF02289263

P. Sneath, R. Sokal, and N. Taxonomy, The Principles and Practice of Numerical Classification, 1973.

M. Ester, H. Kriegel, J. Sander, and X. Xu, A density-based algorithm for discovering clusters in large spatial databases with noise, Proceedings of the 2nd international conference on Knowledge Discovery and Data mining KDD'96, pp.226-231, 1996.

M. Ankerst, M. M. Breunig, H. Kriegel, and J. Sander, Optics: Ordering points to identify the clustering structure, Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD'99), pp.49-60, 1999.

K. Fukunaga and L. D. Hostetler, The estimation of the gradient of a density function, with applications in pattern recognition, IEEE Transactions on Information Theory, vol.21, issue.1, pp.32-40, 1975.
DOI : 10.1109/TIT.1975.1055330

. Fig, Unsupervised classification results for the AVIRIS Hekla hyperspectral data set, AP (OCCR: 55.56%, ACCR: 50.26%); (b): KNNCLUST (OCCR: 72.00%, ACCR: 72.07%); (c): NPSEM (OCCR: 52.99%

A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, vol.39, issue.1, pp.1-38, 1977.

G. Celeux and J. Diebolt, A probabilistic teacher algorithm for iterative maximum likelihood estimation, " in Classification and related methods of data analysis, pp.617-623, 1988.

G. Celeux and G. Govaert, A classification EM algorithm for clustering and two stochastic versions, Computational Statistics & Data Analysis, vol.14, issue.3, pp.315-332, 1992.
DOI : 10.1016/0167-9473(92)90042-E

URL : https://hal.archives-ouvertes.fr/inria-00075196

C. E. Rasmussen, The infinite Gaussian mixture model, Advances in Neural Information Processing Systems (NIPS)

C. Antoniak, Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems, The Annals of Statistics, vol.2, issue.6, pp.1152-1174, 1974.
DOI : 10.1214/aos/1176342871

J. Shi and J. Malik, Normalized cuts and image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.22, issue.8, pp.888-905, 2000.

B. Schölkopf, A. Smola, and K. Müller, Nonlinear Component Analysis as a Kernel Eigenvalue Problem, Neural Computation, vol.20, issue.5, pp.1299-1319, 1998.
DOI : 10.1007/BF02281970

I. S. Dhillon, S. Mallela, and R. Kumar, A divisive information theoretic feature clustering algorithm for text classification, Journal of Machine Learning Research, vol.3, pp.1265-1287, 2003.

A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh, Clustering with Bregman Divergences, Journal of Machine Learning Research, vol.6, pp.1705-1749, 2005.
DOI : 10.1137/1.9781611972740.22

L. Faivishevsky and J. Goldberger, A Nonparametric Information Theoretic Clustering Algorithm, International Conference on Machine Learning, 2010.

M. Wang and F. Sha, Information theoretical clustering via semidefinite programming, International Conference on Artificial Intelligence and Statistics, ser. JMLR Proceedings, pp.761-769, 2011.

A. C. Müller, S. Nowozin, and C. H. Lampert, Information Theoretic Clustering Using Minimum Spanning Trees, DAGM/OAGM Symposium , ser. Lecture Notes in Computer Science, pp.205-215, 2012.
DOI : 10.1007/978-3-642-32717-9_21

G. Ver-steeg, A. Galstyan, F. Sha, S. Dedeo, M. Sugiyama et al., Demystifying information-theoretic clustering Information-maximization clustering based on squared-loss mutual information, International Conference on Machine Learning, pp.84-131, 2014.

B. J. Frey and D. Dueck, Clustering by Passing Messages Between Data Points, Science, vol.315, issue.5814, pp.972-976, 2007.
DOI : 10.1126/science.1136800

J. Kittler and J. Illingworth, Relaxation labelling algorithms ??? a review, Image and Vision Computing, vol.3, issue.4, pp.206-216, 1985.
DOI : 10.1016/0262-8856(85)90009-5

G. Celeux and J. Diebolt, The SEM algorithm: A probabilistic teacher algorithm derived from the EM algorithm for the mixture problem, Comput. Statist. Quarter, vol.2, pp.73-82, 1985.

D. L. Davies and D. W. Bouldin, A Cluster Separation Measure, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.1, issue.2, pp.224-227, 1979.
DOI : 10.1109/TPAMI.1979.4766909

J. Dunn, A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters, Journal of Cybernetics, vol.3, issue.3, pp.32-57, 1973.
DOI : 10.1080/01969727308546046

T. Zhang, R. Ramakrishnan, and M. Livny, BIRCH: an efficient data clustering method for very large databases, Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data (SIGMOD'96), pp.103-114, 1996.

A. Lorette, X. Descombes, and J. Zerubia, Fully unsupervised fuzzy clustering with entropy criterion, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000, pp.986-989, 2000.
DOI : 10.1109/ICPR.2000.903710

T. S. Ferguson, Bayesian density estimation by mixtures of normal distributions, Recent Advances in Statistics, pp.287-302, 1983.

D. Aldous, Exchangeability and related topics, " inÉcolein´inÉcole d'´ eté de probabilités de Saint-Flour, XIII?1983, pp.1-198, 1985.

J. Sethuraman, A constructive definition of Dirichlet priors, Statistica Sinica, vol.4, pp.639-650, 1994.

T. N. Tran, R. Wehrens, and L. M. Buydens, KNN-kernel density-based clustering for high-dimensional multivariate data, Computational Statistics & Data Analysis, vol.51, issue.2, pp.513-525, 2006.
DOI : 10.1016/j.csda.2005.10.001

C. Robert and G. Casella, A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data, Statistical Science, vol.26, issue.1, pp.102-115, 2011.
DOI : 10.1214/10-STS351

A. Samé, G. Govaert, and C. Ambroise, A mixture model-based on-line CEM algorithm in IDA, ser. Lecture Notes in Computer Science, pp.373-384, 2005.

G. Bougenì-ere, C. Cariou, K. Chehdi, and A. Gay, Unsupervised non parametric data clustering by means of Bayesian inference and information theory, pp.101-108, 2007.

D. Gustafson and W. Kessel, Fuzzy clustering with a fuzzy covariance matrix, 1978 IEEE Conference on Decision and Control including the 17th Symposium on Adaptive Processes, pp.761-766, 1978.
DOI : 10.1109/CDC.1978.268028

J. Lafferty, A. Mccallum, and F. Pereira, Conditional random fields: Probabilistic models for segmenting and labeling sequence data, Proc. 18th International Conf. on Machine Learning, pp.282-289, 2001.

G. Bougenì-ere, C. Cariou, K. Chehdi, and A. Gay, Non parametric stochastic expectation maximization for data clustering, E-business and Telecommunications, ser. Communications in Computer and Information Science, pp.293-303, 2009.

M. Jardino, Unsupervised non-hierarchical entropy-based clustering, " in Data Analysis, Classification, and Related Methods, ser. Studies in Classification, Data Analysis, and Knowledge Organization, pp.29-34, 2000.

R. Kneser and H. Ney, Improved clustering techniques for class-based statistical language modelling, Proc. Eurospeech 93, pp.973-976, 1993.

L. F. Kozachenko and N. N. Leonenko, Sample estimate of the entropy of a random vector, Problemy Peredachi Informatsii, vol.23, issue.2, pp.9-16, 1987.

M. N. Goria, N. N. Leonenko, V. V. Mergel, and P. L. Inverardi, A new class of random vector entropy estimators and its applications in testing statistical hypotheses, Journal of Nonparametric Statistics, vol.27, issue.3, pp.277-297, 2005.
DOI : 10.2307/2285889

N. Leonenko, L. Pronzato, and V. Savani, Estimation of entropies and divergences via nearest neighbors, Tatra Mountains Mathematical Publications, vol.39, pp.265-273, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00322783

Y. Tarabalka, J. A. Benediktsson, and J. Chanussot, Spectral–Spatial Classification of Hyperspectral Imagery Based on Partitional Clustering Techniques, IEEE Transactions on Geoscience and Remote Sensing, vol.47, issue.8, pp.2973-2987, 2009.
DOI : 10.1109/TGRS.2009.2016214

K. Chehdi, M. Soltani, and C. Cariou, Pixel classification of large-size hyperspectral images by affinity propagation, Journal of Applied Remote Sensing, vol.8, issue.1, 2014.
DOI : 10.1117/1.JRS.8.083567

URL : https://hal.archives-ouvertes.fr/hal-01123981

H. Kuhn, The Hungarian method for the assignment problem, Naval Research Logistics Quarterly, vol.3, issue.1-2, pp.83-97, 1955.
DOI : 10.1002/nav.3800020109

C. D. Manning, P. Raghavan, and H. Schütze, An Introduction to Information Retrieval, 2008.
DOI : 10.1017/CBO9780511809071