L. Van-der-maaten, E. Postma, J. Van-den, and . Herik, Dimensionality reduction: a comparative review, Journal of Machine Learning Research, vol.10, pp.66-71, 2009.

J. Christopher and . Burges, Dimension reduction: A guided tour, 2010.

J. Friedman, T. Hastie, and R. Tibshirani, The elements of statistical learning, 2001.

M. Sugiyama, Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis, The Journal of Machine Learning Research, vol.8, pp.1027-1061, 2007.

P. Eric, . Xing, Y. Andrew, . Ng, I. Michael et al., Distance metric learning with application to clustering with side-information Advances in neural information processing systems, pp.505-512, 2003.

Q. Kilian, . Weinberger, K. Lawrence, and . Saul, Distance metric learning for large margin nearest neighbor classification, The Journal of Machine Learning Research, vol.10, pp.207-244, 2009.

P. Absil, R. Mahony, and R. Sepulchre, Optimization algorithms on matrix manifolds, 2009.
DOI : 10.1515/9781400830244

M. Cuturi, Sinkhorn distances: Lightspeed computation of optimal transport, NIPS, pp.2292-2300, 2013.

C. Villani, Optimal transport: old and new, 2008.
DOI : 10.1007/978-3-540-71050-9

J. Benamou, G. Carlier, M. Cuturi, L. Nenna, and G. Peyré, Iterative Bregman Projections for Regularized Transportation Problems, SIAM Journal on Scientific Computing, vol.37, issue.2, pp.1111-1138, 2015.
DOI : 10.1137/141000439
URL : https://hal.archives-ouvertes.fr/hal-01096124

M. Cuturi and A. Doucet, Fast computation of wasserstein barycenters, ICML, 2014.

V. Seguy and M. Cuturi, Principal geodesic analysis for probability measures under the optimal transport metric, NIPS, pp.3294-3302

J. Solomon, R. Rustamov, G. Leonidas, and A. Butscher, Wasserstein propagation for semi-supervised learning, ICML, pp.306-314, 2014.

N. Courty, R. Flamary, and D. Tuia, Domain Adaptation with Regularized Optimal Transport, ECML PKDD, 2014.
DOI : 10.1007/978-3-662-44848-9_18
URL : https://hal.archives-ouvertes.fr/hal-01018698

C. Frogner, C. Zhang, H. Mobahi, M. Araya, and T. Poggio, Learning with a wasserstein loss, NIPS, pp.2044-2052

J. Mueller and T. Jaakkola, Principal differences analysis: Interpretable characterization of differences between distributions, NIPS, pp.1693-1701

B. Colson, P. Marcotte, and G. Savard, An overview of bilevel optimization, Annals of Operations Research, vol.89, issue.1, pp.235-256, 2007.
DOI : 10.1007/s10479-007-0176-2

K. Brandt-petersen and M. S. Pedersen, The matrix cookbook, p.15, 2008.

M. Schmidt, Minconf-projection methods for optimization with simple constraints in matlab, 2008.

N. Boumal, B. Mishra, P. Absil, and R. Sepulchre, Manopt, a matlab toolbox for optimization on manifolds, The Journal of Machine Learning Research, vol.15, issue.1, pp.1455-1459, 2014.

Y. Bengio, Gradient-Based Optimization of Hyperparameters, Neural Computation, vol.58, issue.8, pp.1889-1900, 2000.
DOI : 10.1038/317314a0

N. Bonneel, G. Peyré, and M. Cuturi, Wasserstein barycentric coordinates, ACM Transactions on Graphics, vol.35, issue.4, p.2016
DOI : 10.1145/2897824.2925918
URL : https://hal.archives-ouvertes.fr/hal-01303148

L. Van-der-maaten and G. Hinton, Visualizing data using t-sne, Journal of Machine Learning Research, vol.9, pp.2579-260585, 2008.

J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang et al., DeCAF: a deep convolutional activation feature for generic visual recognition, Proceedings of The 31st International Conference on Machine Learning, pp.647-655, 2014.

G. Griffin, A. Holub, and P. Perona, Caltech-256 Object Category Dataset, 2007.