R. Abraham and J. Robbin, Transversal mappings and flows, 1967.

P. Absil, R. Mahony, and B. Andrews, Convergence of the iterates of descent methods for analytic cost functions, SIAM Journal on Optimization, vol.16, issue.2, pp.531-547, 2005.

L. Ambrosio, N. Gigli, and G. Savaré, Gradient flows: in metric spaces and in the space of probability measures, 2008.

F. Bach, Breaking the curse of dimensionality with convex neural networks, Journal of Machine Learning Research, vol.18, issue.19, pp.1-53, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01098505

A. Blanchet and J. Bolte, A family of functional inequalities: ?ojasiewicz inequalities and displacement convex functions, 2016.
DOI : 10.1016/j.jfa.2018.06.014

URL : http://publications.ut-capitole.fr/26111/1/Blanchet_2611.pdf

N. Boyd, G. Schiebinger, and B. Recht, The alternating descent conditional gradient method for sparse inverse problems, SIAM Journal on Optimization, vol.27, issue.2, pp.616-639, 2017.
DOI : 10.1109/camsap.2015.7383735

URL : http://arxiv.org/pdf/1507.01562

K. Bredies and H. Katriina-pikkarainen, Inverse problems in spaces of measures. ESAIM: Control, Optimisation and Calculus of Variations, vol.19, pp.190-218, 2013.
DOI : 10.1051/cocv/2011205

URL : https://www.esaim-cocv.org/articles/cocv/pdf/2013/01/cocv110109.pdf

F. E. Browder, Fixed point theory and nonlinear problems, Proc. Sym. Pure. Math, vol.39, pp.49-88, 1983.

P. Catala, V. Duval, and G. Peyré, A low-rank approach to off-the-grid sparse deconvolution, Journal of Physics: Conference Series, vol.904, issue.1, p.12015, 2017.
DOI : 10.1088/1742-6596/904/1/012015

URL : https://hal.archives-ouvertes.fr/hal-01672896

D. L. Cohn, Measure theory, vol.165, 1980.

L. Patrick, J. Combettes, and . Pesquet, Proximal splitting methods in signal processing, Fixed-point algorithms for inverse problems in science and engineering, pp.185-212, 2011.

Y. Castro and F. Gamboa, Exact reconstruction using Beurling minimal extrapolation, Journal of Mathematical Analysis and applications, vol.395, issue.1, pp.336-354, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00678423

V. Duval and G. Peyré, Exact support recovery for sparse spikes deconvolution, Foundations of Computational Mathematics, vol.15, issue.5, pp.1315-1355, 2015.
DOI : 10.1007/s10208-014-9228-6

URL : https://hal.archives-ouvertes.fr/hal-00839635

I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, 2016.

S. Gunasekar, B. E. Woodworth, S. Bhojanapalli, B. Neyshabur, and N. Srebro, Implicit regularization in matrix factorization, Advances in Neural Information Processing Systems, vol.30, 2017.
DOI : 10.1109/ita.2018.8503198

URL : http://arxiv.org/pdf/1705.09280

D. Benjamin, R. Haeffele, and . Vidal, Global optimality in neural network training, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.7331-7339, 2017.

D. Hauer and J. Mazón, Kurdyka-?ojasiewicz-Simon inequality for gradient flows in metric spaces, 2017.

S. Haykin, Neural Networks: A Comprehensive Foundation, 1994.

M. Jaggi, Revisiting Frank-Wolfe: Projection-free sparse convex optimization, Proceedings of the International Conference on Machine Learning (ICML), 2013.

M. Journée, F. Bach, P. Absil, and R. Sepulchre, Low-rank optimization on the cone of positive semidefinite matrices, SIAM Journal on Optimization, vol.20, issue.5, pp.2327-2351, 2010.

H. Kushner and G. George-yin, Stochastic approximation and recursive algorithms and applications, vol.35, 2003.

J. Lasserre, Moments, positive polynomials and their applications, vol.1, 2010.
DOI : 10.1142/p665

Y. Li and Y. Yuan, Convergence analysis of two-layer neural networks with ReLU activation, Advances in Neural Information Processing Systems, pp.597-607, 2017.

S. Mei, A. Montanari, and P. Nguyen, A mean field view of the landscape of two-layer neural networks, Proceedings of the National Academy of Sciences, vol.115, issue.33, pp.7665-7671, 2018.

A. Nitanda and T. Suzuki, Stochastic particle gradient descent for infinite ensembles, 2017.

C. Poon, N. Keriven, and G. Peyré, A dual certificates analysis of compressive off-the-grid recovery, 2018.

R. T. Rockafellar, Convex Analysis, 1997.

M. Grant, E. Rotskoff, and . Vanden-eijnden, Neural networks as interacting particle systems: Asymptotic convexity of the loss landscape and universal scaling of the approximation error, 2018.

F. Santambrogio, Optimal transport for applied mathematicians, 2015.
DOI : 10.1007/978-3-319-20828-2

, Filippo Santambrogio. {Euclidean, metric, and Wasserstein} gradient flows: an overview, Bulletin of Mathematical Sciences, vol.7, issue.1, pp.87-154, 2017.

D. Scieur, V. Roulet, F. Bach, and A. Aspremont, Integration methods and optimization algorithms, Advances in Neural Information Processing Systems, pp.1109-1118, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01474045

J. Sirignano and K. Spiliopoulos, Mean field analysis of neural networks, 2018.

M. Soltanolkotabi, A. Javanmard, and J. D. Lee, Theoretical insights into the optimization landscape of over-parameterized shallow neural networks, 2017.

D. Soudry and E. Hoffer, Exponentially vanishing sub-optimal local minima in multilayer neural networks, 2017.

L. Venturi, A. Bandeira, and J. Bruna, Neural networks with finite intrinsic dimension have no spurious valleys, 2018.

C. Wang, Y. Wang, and R. Schapire, Functional Frank-Wolfe boosting for general loss functions, 2015.

H. Whitney, A function not constant on a connected set of critical points, Duke Mathematical Journal, vol.1, issue.4, pp.514-517, 1935.