. [. Teboulle and . Beck, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imaging Sci, vol.2, issue.1, pp.183-202, 2009.

]. F. Bac14 and . Bach, Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression, Journal of Machine Learning Research, vol.15, pp.595-627, 2014.

M. [. Brandì-ere and . Duflo, Les algorithmes stochastiques contournent-ils lespì eges ? Annales de l'I.H.P. Probabilités et Statistiques, pp.395-427, 1996.

]. M. Ben06 and . Bena¨?mbena¨?m, Dynamics of stochastic approximation algorithms, Séminaire de Probabilités XXXIII, 2006.

M. Bena¨?mbena¨?m and M. W. Hirsh, Asymptotic pseudotrajectories and chain recurrent flows, with applications, Journal of Dynamics and Differential Equations, vol.24, issue.1, pp.141-176, 1996.
DOI : 10.1007/BF02218617

]. P. Bil95 and . Billingsley, Convergence of Probability Measures Wiley series in Probability & Statistics, 1995.

G. [. Boucheron, P. Lugosi, and . Massart, Concentration inequalities A nonasymptotic theory of independence
URL : https://hal.archives-ouvertes.fr/hal-00794821

M. Bena¨?mbena¨?m, M. Ledoux, and O. Raimond, Self-interacting diffusions. Probab. Theory Related Fields, pp.1-41, 2002.

E. [. Bach and . Moulines, Non-asymptotic analysis of stochastic approximation algorithms for machine learning, Advances in Neural Information Processing Systems (NIPS), 2011.
URL : https://hal.archives-ouvertes.fr/hal-00608041

L. [. Boyd and . Vandenberghe, Convex optimization, 2004.

]. A. Ceg09a, H. Cabot, S. Engler, and . Gadat, On the long time behavior of second order differential equations with asymptotically small dissipation, Trans. Amer. Math. Soc, issue.11, pp.3615983-6017, 2009.

]. A. Ceg09b, H. Cabot, S. Engler, and . Gadat, Second-order differential equations with asymptotically small dissipation and piecewise flat potentials, Proceedings of the Seventh Mississippi State?UAB Conference on Differential Equations and Computational Simulations, pp.33-38, 2009.

]. M. Duf97 and . Duflo, Random iterative models, adaptive algorithms and stochastic approximations, Applications of Mathematics, vol.22, 1997.

F. [. Flammarion and . Bach, From averaging to acceleration, there is only a step-size, Proceedings of the International Conference on Learning Theory (COLT), 2015.
URL : https://hal.archives-ouvertes.fr/hal-01136945

M. Frank and P. Wolfe, An algorithm for quadratic programming, Naval Research Logistics Quarterly, vol.3, issue.1-2, pp.95-110, 1956.
DOI : 10.1002/nav.3800030109

G. [. Ghadimi and . Lan, Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming, SIAM Journal on Optimization, vol.23, issue.4, pp.2341-2368, 2013.
DOI : 10.1137/120880811

URL : http://arxiv.org/abs/1309.5549

G. [. Ghadimi and . Lan, Accelerated gradient methods for nonconvex nonlinear and stochastic programming, Mathematical Programming, vol.19, issue.1, pp.59-99, 2016.
DOI : 10.1007/s10107-015-0871-8

URL : http://arxiv.org/abs/1310.3787

L. [. Gadat, F. Miclo, and . Panloup, A stochastic model for speculative bubbles. Alea: Latin American journal of probability and mathematical statistics, pp.491-532, 2015.
URL : https://hal.archives-ouvertes.fr/hal-00937447

F. [. Gadat and . Panloup, Long time behaviour and stationary regime of memory gradient diffusions, Annales de l'Institut Henri Poincar??, Probabilit??s et Statistiques, vol.50, issue.2, pp.564-601, 2014.
DOI : 10.1214/12-AIHP536

URL : https://hal.archives-ouvertes.fr/hal-00757068

]. P. Har82 and . Hartman, Ordinary Differential Equations, Classic in Applied Mathematics, 1982.

C. Hu, W. Pan, and J. T. Kwok, Accelerated gradient methods for stochastic optimization and online learning, Advances in Neural Information Processing Systems, 2009.

J. [. Kiefer and . Wolfowitz, Stochastic Estimation of the Maximum of a Regression Function, The Annals of Mathematical Statistics, vol.23, issue.3, pp.462-466, 1952.
DOI : 10.1214/aoms/1177729392

G. [. Kushner and . Yin, Stochastic approximation and recursive algorithms and applications, 2003.

]. G. Lan12 and . Lan, An optimal method for stochastic composite optimization, Mathematical Programming, vol.24, issue.1-2, pp.365-397, 2012.
DOI : 10.1007/s10107-010-0434-y

]. V. Lem07 and . Lemaire, An adaptive scheme for the approximation of dissipative systems, Stochastic Processes and their Applications, pp.1491-1518, 2007.

J. Lee, M. Simchowitz, M. Jordan, and B. Recht, Gradient descent converges to minimizers, 2016.

J. C. Mattingly, A. M. Stuart, and D. J. Higham, Ergodicity for SDEs and approximations: locally Lipschitz vector fields and degenerate noise, Stochastic Processes and their Applications, vol.101, issue.2, pp.185-232, 2002.
DOI : 10.1016/S0304-4149(02)00150-3

URL : http://doi.org/10.1016/s0304-4149(02)00150-3

P. Sean, R. L. Meyn, and . Tweedie, Stability of Markovian processes. III. Foster-Lyapunov criteria for continuous-time processes, Adv. in Appl. Probab, vol.25, issue.3, pp.518-548, 1993.

D. [. Nemirovski and . Yudin, Problem complexity and method efficiency in optimization, Wiley-Interscience Series in Discrete Mathematics, 1983.

]. R. Pem90 and . Pemantle, Non-convergence to unstable points in urn models and stochastic approximations, Annals of Probability, vol.18, pp.698-712, 1990.

A. [. Polyak and . Juditsky, Acceleration of Stochastic Approximation by Averaging, SIAM Journal on Control and Optimization, vol.30, issue.4, pp.838-855, 1992.
DOI : 10.1137/0330046

H. Poincaré, Mémoire sur les courbes définies par uné equation différentielle (iv), Journal de Mathématiques Pures et Appliquées, vol.4, pp.151-217, 1886.

]. B. Pol64 and . Polyak, Some methods of speeding up the convergence of iteration methods, USSR Computational Mathematics and Mathematical Physics, vol.4, pp.1-17, 1964.

S. [. Robbins and . Monro, A Stochastic Approximation Method, The Annals of Mathematical Statistics, vol.22, issue.3, pp.400-407, 1951.
DOI : 10.1214/aoms/1177729586

D. Ruppert, Efficient estimations from a slowly convergent robbins-monro process, 1988.

]. K. Str94 and . Stromberg, Probability for Analysts, 1994.

S. [. Stroock and . Varadhan, Multidimensional diffusion processes, Classics in Mathematics, 2006.
DOI : 10.1007/3-540-28999-2

W. [. Boyd, E. J. Su, and . Candes, A differential equation for modeling nesterov's accelerated gradient method: theory and insights, Journal of Machine Learning Research, 2016.

Q. [. Yang, Z. Lin, and . Li, Unified convergence analysis of stochastic momentum methods for convex and non-convex optimization, 2016.