F. Aurenhammer, F. Hoffmann, and B. Aronov, Minkowski-Type Theorems and Least-Squares Clustering, Algorithmica, vol.20, issue.1, pp.61-76, 1998.
DOI : 10.1007/PL00009187

F. Bassetti, A. Bodini, and E. Regazzini, On minimum Kantorovich distance estimators, Statistics & Probability Letters, vol.76, issue.12, pp.1298-1302, 2006.
DOI : 10.1016/j.spl.2006.02.001

G. Carlier, V. Duval, G. Peyré, and B. Schmitzer, Convergence of entropic schemes for optimal transport and gradient flows, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01246086

R. Cominetti and J. Martin, Asymptotic analysis of the exponential penalty trajectory in linear programming, Mathematical Programming, vol.27, issue.2, pp.169-187, 1994.
DOI : 10.1007/BF01582220

M. Cuturi, Sinkhorn distances: Lightspeed computation of optimal transport, Adv. in Neural Information Processing Systems, pp.2292-2300, 2013.

A. Dieuleveut and F. Bach, Non-parametric stochastic approximation with large step sizes. arXiv preprint, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01053831

J. Franklin and J. Lorenz, On the scaling of multidimensional matrices, Linear Algebra and its Applications, vol.114, issue.115, pp.717-735, 1989.
DOI : 10.1016/0024-3795(89)90490-4

C. Frogner, C. Zhang, H. Mobahi, M. Araya, and T. Poggio, Learning with a Wasserstein loss, Adv. in Neural Information Processing Systems, pp.2044-2052, 2015.

A. Gretton, K. Borgwardt, M. Rasch, B. Schölkopf, and A. Smola, A kernel method for the two-sampleproblem, Adv. in Neural Information Processing Systems, pp.513-520, 2006.

L. Kantorovich, On the transfer of masses (in russian), Doklady Akademii Nauk, vol.37, issue.2, pp.227-229, 1942.

M. J. Kusner, Y. Sun, N. I. Kolkin, and K. Q. Weinberger, From word embeddings to document distances, ICML, 2015.

Q. Mérigot, A Multiscale Approach to Optimal Transport, Computer Graphics Forum, vol.40, issue.2, pp.1583-1592, 2011.
DOI : 10.1111/j.1467-8659.2011.02032.x

G. Montavon, K. Müller, and M. Cuturi, Wasserstein training of restricted Boltzmann machines, Adv. in Neural Information Processing Systems, 2016.

J. Pennington, R. Socher, and C. D. Manning, Glove: Global Vectors for Word Representation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.1532-1543, 2014.
DOI : 10.3115/v1/D14-1162

B. Polyak and A. Juditsky, Acceleration of Stochastic Approximation by Averaging, SIAM Journal on Control and Optimization, vol.30, issue.4, pp.838-855, 1992.
DOI : 10.1137/0330046

A. Rahimi and B. Recht, Random features for large-scale kernel machines, Adv. in Neural Information Processing Systems, pp.1177-1184, 2007.

Y. Rubner, C. Tomasi, and L. J. Guibas, The earth mover's distance as a metric for image retrieval, International Journal of Computer Vision, vol.40, issue.2, pp.99-121, 2000.
DOI : 10.1023/A:1026543900054

F. Santambrogio, Optimal transport for applied mathematicians, 2015.
DOI : 10.1007/978-3-319-20828-2

M. Schmidt, N. L. Roux, and F. Bach, Minimizing finite sums with the stochastic average gradient, Mathematical Programming, 2016.
URL : https://hal.archives-ouvertes.fr/hal-00860051

R. Sinkhorn, A Relationship Between Arbitrary Positive Matrices and Doubly Stochastic Matrices, The Annals of Mathematical Statistics, vol.35, issue.2, pp.876-879, 1964.
DOI : 10.1214/aoms/1177703591

J. Solomon, F. De-goes, G. Peyré, M. Cuturi, A. Butscher et al., Convolutional wasserstein distances, ACM Transactions on Graphics, vol.34, issue.4, pp.1-6611, 2015.
DOI : 10.1145/2766963

URL : https://hal.archives-ouvertes.fr/hal-01188953

I. Steinwart and A. Christmann, Support vector machines, 2008.

C. Villani, Topics in Optimal Transportation, Graduate studies in Math. AMS, vol.58, 2003.
DOI : 10.1090/gsm/058

G. Wu, E. Chang, Y. K. Chen, and C. Hughes, Incremental approximate matrix factorization for speeding up support vector machines, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '06, pp.760-766, 2006.
DOI : 10.1145/1150402.1150500