G. Alain and Y. Bengio, What regularized auto-encoders learn from the data-generating distribution, Journal of Machine Learning Research, vol.15, issue.1, pp.3563-3593, 2014.

E. Aljalbout, V. Golkov, Y. Siddiqui, and D. Cremers, Clustering with deep learning: Taxonomy and new methods, 2018.

M. Arjovsky, S. Chintala, and L. Bottou, Wasserstein generative adversarial networks, Proceedings of the 34th International Conference on Machine Learning of Proceedings of Machine Learning Research, pp.214-223, 2017.

P. Baldi and K. Hornik, Neural networks and principal component analysis: Learning from examples without local minima, Neural Networks, vol.2, issue.1, pp.53-58, 1989.
DOI : 10.1016/0893-6080(89)90014-2

Y. Bengio, A. Courville, and P. Vincent, Representation Learning: A Review and New Perspectives, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, issue.8, pp.1798-1828, 2013.
DOI : 10.1109/TPAMI.2013.50
URL : http://www.cs.princeton.edu/courses/archive/spring13/cos598C/Representation Learning - A Review and New Perspectives.pdf

P. Bojanowski and A. Joulin, Unsupervised learning by predicting noise. arXiv preprint, 2017.

K. Tsung-han-chan, S. Jia, J. Gao, Z. Lu, Y. Zeng et al., PCANet: A Simple Deep Learning Baseline for Image Classification?, IEEE Transactions on Image Processing, vol.24, issue.12, pp.5017-5032, 2015.
DOI : 10.1109/TIP.2015.2475625

W. Chang, On Using Principal Components Before Separating a Mixture of Two Multivariate Normal Distributions, Applied Statistics, vol.32, issue.3, pp.267-275, 1983.
DOI : 10.2307/2347949

M. Cuturi, Sinkhorn distances: Lightspeed computation of optimal transport, Advances in Neural Information Processing Systems 26, pp.2292-2300, 2013.

P. Arthur, . Dempster, M. Nan, . Laird, B. Donald et al., Maximum likelihood from incomplete data via the EM algorithm, Journal of the royal statistical society. Series B (methodological), pp.1-38, 1977.

N. Dilokthanakul, P. A. Mediano, M. Garnelo, C. H. Matthew, H. Lee et al., Deep unsupervised clustering with gaussian mixture variational autoencoders. arXiv preprint, 2016.

L. Dinh, J. Sohl-dickstein, and S. Bengio, Density estimation using real nvp. International Conference on Learning Representations, 2017.

A. Kamran-ghasedi-dizaji, C. Herandi, W. Deng, H. Cai, and . Huang, Deep clustering via joint convolutional autoencoder embedding and relative entropy minimization, 2017 IEEE International Conference on Computer Vision (ICCV), pp.5747-5756, 2017.

R. O. Duda, P. E. Hart, and D. G. Stork, Pattern classification, 2012.

V. Dumoulin, I. Belghazi, B. Poole, O. Mastropietro, A. Lamb et al., Adversarially learned inference, International Conference on Learning Representations, 2017.

C. Fraley and A. E. Raftery, Model-Based Clustering, Discriminant Analysis, and Density Estimation, Journal of the American Statistical Association, vol.97, issue.458, pp.611-631, 2002.
DOI : 10.1198/016214502760047131

A. Genevay, G. Peyré, and M. Cuturi, Learning Generative Models with Sinkhorn Divergences URL https, 2017.

Y. Goldberg, Neural network methods for natural language processing. Synthesis Lectures on Human Language Technologies, pp.1-309, 2017.

I. Goodfellow, J. Pouget-abadie, M. Mirza, B. Xu, D. Warde-farley et al., Generative adversarial nets, Advances in Neural Information Processing Systems, pp.2672-2680, 2014.

I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville, Improved training of wasserstein gans, 2017.

W. Hu, T. Miyato, S. Tokui, E. Matsumoto, and M. Sugiyama, Learning discrete representations via information maximizing self-augmented training, Proceedings of the 34th International Conference on Machine Learning of Proceedings of Machine Learning Research, pp.1558-1567, 2017.

Z. Jiang, Y. Zheng, H. Tan, B. Tang, and H. Zhou, Variational Deep Embedding: An Unsupervised and Generative Approach to Clustering, Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence
DOI : 10.24963/ijcai.2017/273

O. Kilinc and I. Uysal, Learning latent representations in neural networks for clustering through pseudo supervision and graph-based activity regularization URL https, International Conference on Learning Representations, 2018.

P. Diederik, J. Kingma, and . Ba, Adam: A method for stochastic optimization. arXiv preprint, 2014.

P. Diederik, M. Kingma, and . Welling, Auto-encoding variational bayes, Proceedings of the 2nd International Conference on Learning Representations (ICLR), 2013.

W. Harold and . Kuhn, The hungarian method for the assignment problem, Naval research logistics quarterly, vol.2, issue.12, pp.83-97, 1955.

Y. Lecun, Y. Bengio, and G. Hinton, Deep learning, Nature, vol.9, issue.7553, pp.436-444, 2015.
DOI : 10.1007/s10994-013-5335-x

D. David, Y. Lewis, . Yang, G. Tony, F. Rose et al., Rcv1: A new benchmark collection for text categorization research, Journal of machine learning research, vol.5, pp.361-397, 2004.

L. Van-der-maaten and G. Hinton, Visualizing data using t-sne, Journal of Machine Learning Research, vol.9, pp.2579-2605, 2008.

J. Mairal, M. Elad, and G. Sapiro, Sparse Representation for Color Image Restoration, IEEE Transactions on Image Processing, vol.17, issue.1, pp.53-69, 2008.
DOI : 10.1109/TIP.2007.911828

Y. Andrew, . Ng, I. Michael, Y. Jordan, and . Weiss, On spectral clustering: Analysis and an algorithm, pp.849-856, 2001.

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang et al., Automatic differentiation in pytorch, NIPS-W, 2017.

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion et al., Scikit-learn: Machine learning in Python, Journal of Machine Learning Research, vol.12, pp.2825-2830, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00650905

J. Ponce and D. Forsyth, Computer vision: a modern approach, 2011.
URL : https://hal.archives-ouvertes.fr/hal-01063327

G. Schwarz, Estimating the dimension of a model. The annals of statistics, pp.461-464, 1978.

C. Song, Y. Huang, F. Liu, Z. Wang, and L. Wang, Deep auto-encoder based clustering. Intelligent Data Analysis, pp.65-76, 2014.

S. Sonoda and N. Murata, Decoding stacked denoising autoencoders. arXiv preprint, 2016.

M. Stephens, Dealing with label switching in mixture models, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.62, issue.4, pp.795-809, 2000.
DOI : 10.1111/1467-9868.00265

A. Stisen, H. Blunck, S. Bhattacharya, T. S. Prentow, A. Mikkel-baun-kjaergaard et al., Smart Devices are Different, Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems, SenSys '15, pp.127-140, 2015.
DOI : 10.1145/1631040.1631042

I. Tolstikhin, O. Bousquet, S. Gelly, and B. Schoelkopf, Wasserstein auto-encoders URL https, International Conference on Learning Representations, pp.7-8, 2018.

A. Matthew, A. P. Turk, and . Pentland, Face recognition using eigenfaces, Computer Vision and Pattern Recognition Proceedings CVPR'91., IEEE Computer Society Conference on, pp.586-591, 1991.

C. Villani, Optimal transport: old and new, 2008.
DOI : 10.1007/978-3-540-71050-9

P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P. Manzagol, Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion, Journal of Machine Learning Research, vol.11, pp.3371-3408, 2010.

U. Von and L. , A tutorial on spectral clustering, Statistics and computing, vol.17, issue.4, pp.395-416, 2007.

J. Xie, R. Girshick, and A. Farhadi, Unsupervised deep embedding for clustering analysis. arXiv preprint, 2015.

B. Yang, X. Fu, D. Nicholas, M. Sidiropoulos, and . Hong, Towards k-means-friendly spaces: Simultaneous deep learning and clustering. arXiv preprint, 2016.

J. Yang, D. Parikh, and D. Batra, Joint Unsupervised Learning of Deep Representations and Image Clusters, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.556

L. Zelnik-manor and P. Perona, Self-tuning spectral clustering Advances in neural information processing systems, pp.1601-160816, 2004.