L. Bottou and Y. Le-cun, On-line learning for very large data sets Applied Stochastic Models in Business and Industry, pp.137-151, 2005.

A. S. Nemirovski and D. B. Yudin, Problem complexity and method efficiency in optimization, 1983.

B. T. Polyak and A. B. Juditsky, Acceleration of Stochastic Approximation by Averaging, SIAM Journal on Control and Optimization, vol.30, issue.4, pp.838-855, 1992.
DOI : 10.1137/0330046

H. Robbins and S. Monro, A Stochastic Approximation Method, The Annals of Mathematical Statistics, vol.22, issue.3, pp.400-407, 1951.
DOI : 10.1214/aoms/1177729586

Y. Nesterov and J. P. Vial, Confidence level solutions for stochastic programming, Automatica, vol.44, issue.6, pp.1559-1568, 2008.
DOI : 10.1016/j.automatica.2008.01.017
URL : http://ecolu-info.unige.ch/~logilab/reports/GradStoc.ps

A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, Robust Stochastic Approximation Approach to Stochastic Programming, SIAM Journal on Optimization, vol.19, issue.4, pp.1574-1609, 2009.
DOI : 10.1137/070704277
URL : https://hal.archives-ouvertes.fr/hal-00976649

S. Shalev-shwartz, Y. Singer, and N. Srebro, Pegasos, Proceedings of the 24th international conference on Machine learning, ICML '07, 2007.
DOI : 10.1145/1273496.1273598

L. Xiao, Dual averaging methods for regularized stochastic learning and online optimization, Journal of Machine Learning Research, vol.9, pp.2543-2596, 2010.

F. Bach and E. Moulines, Non-asymptotic analysis of stochastic approximation algorithms for machine learning, Advances in Neural Information Processing Systems (NIPS), 2011.
URL : https://hal.archives-ouvertes.fr/hal-00608041

F. Bach and E. Moulines, Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n), Advances in Neural Information Processing Systems (NIPS), 2013.
URL : https://hal.archives-ouvertes.fr/hal-00831977

A. Dieuleveut, N. Flammarion, and F. Bach, Harder, better, faster, stronger convergence rates for least-squares regression, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01275431

N. Le-roux, M. Schmidt, and F. Bach, A stochastic gradient method with an exponential convergence rate for finite training sets, Advances in Neural Information Processing Systems, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00674995

R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, Advances in Neural Information Processing Systems, 2013.

A. Defazio, F. Bach, and S. Lacoste-julien, Saga: A fast incremental gradient method with support for non-strongly convex composite objectives, Advances in Neural Information Processing Systems, pp.1646-1654, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01016843

V. Mikhail and . Solodov, Incremental gradient algorithms with stepsizes bounded away from zero, Computational Optimization and Applications, vol.11, issue.1, pp.23-35, 1998.

M. Schmidt and N. L. Roux, Fast convergence of stochastic gradient descent under a strong growth condition, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00855113

A. Dieuleveut and F. Bach, Nonparametric stochastic approximation with large step-sizes. The Annals of Statistics, pp.1363-1399, 2016.
DOI : 10.1214/15-aos1391
URL : https://hal.archives-ouvertes.fr/hal-01053831

T. Zhang, Statistical behavior and consistency of classification methods based on convex risk minimization, The Annals of Statistics, vol.32, issue.1, pp.56-85, 2004.
DOI : 10.1214/aos/1079120130

L. Peter, M. I. Bartlett, J. D. Jordan, and . Mcauliffe, Convexity, classification, and risk bounds, Journal of the American Statistical Association, vol.101, issue.473, pp.138-156, 2006.

E. Mammen and A. Tsybakov, Smooth discrimination analysis. The Annals of Statistics, pp.1808-1829, 1999.

J. Audibert and A. B. Tsybakov, Fast learning rates for plug-in classifiers. The Annals of statistics, pp.608-633, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00160849

V. Koltchinskii and O. Beznosova, Exponential Convergence Rates in Classification, International Conference on Computational Learning Theory, 2005.
DOI : 10.1007/11503415_20

P. Jain, S. M. Kakade, R. Kidambi, P. Netrapalli, and A. Sidford, Parallelizing stochastic approximation through mini-batching and tail-averaging, 2016.

J. Shawe-taylor and N. Cristianini, Kernel Methods for Pattern Analysis, 2004.
DOI : 10.1017/CBO9780511809682

B. Schölkopf and A. J. Smola, Learning with Kernels, 2002.

A. Charles, Y. Micchelli, H. Xu, and . Zhang, Universal kernels, Journal of Machine Learning Research, vol.7, pp.2651-2667, 2006.

L. Devroye, L. Györfi, and G. Lugosi, A probabilistic theory of pattern recognition, 2013.
DOI : 10.1007/978-1-4612-0711-5

A. Caponnetto and E. D. Vito, Optimal Rates for the Regularized Least-Squares Algorithm, Foundations of Computational Mathematics, vol.7, issue.3, pp.331-368, 2007.
DOI : 10.1007/s10208-006-0196-8

K. Fukumizu, F. Bach, and M. I. Jordan, Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces, Journal of Machine Learning Research, vol.5, pp.73-99, 2004.
DOI : 10.21236/ADA446572

A. Robert, J. J. Adams, and . Fournier, Sobolev spaces Academic press, 2003.

A. Défossez and F. Bach, Constant step size least-mean-square: Bias-variance trade-offs and optimal sampling distributions, Proc. AISTATS, 2015.

M. Sham, A. Kakade, and . Tewari, On the generalization ability of online strongly convex programming algorithms, Advances in Neural Information Processing Systems, 2009.

C. Ciliberto, L. Rosasco, and A. Rudi, A consistent regularization approach for structured prediction, Advances in Neural Information Processing Systems, 2016.

A. Osokin, F. Bach, and S. Lacoste-julien, On structured prediction theory with calibrated convex surrogate losses, Advances in Neural Information Processing Systems, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01611691

B. Taskar, V. Chatalbashev, D. Koller, and C. Guestrin, Learning structured prediction models, Proceedings of the 22nd international conference on Machine learning , ICML '05, 2005.
DOI : 10.1145/1102351.1102464

A. Rudi and L. Rosasco, Generalization properties of learning with random features, Advances in Neural Information Processing Systems, 2017.

I. Pinelis, Optimum bounds for the distributions of martingales in banach spaces. The Annals of Probability, pp.1679-1706, 1994.

L. Rosasco, M. Belkin, and E. D. Vito, On learning with integral operators, Journal of Machine Learning Research, vol.11, issue.Feb, pp.905-934, 2010.