. Aronszajn, Theory of reproducing kernels, Transactions of the American Mathematical Society, vol.68, issue.3, pp.337-404, 1950.
DOI : 10.1090/S0002-9947-1950-0051437-7

F. Bach, Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression, J. Mach. Learn. Res, vol.15, issue.1, pp.595-627, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00804431

F. Bach and E. Moulines, Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n), Advances in Neural Information Processing Systems, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00831977

A. Beck and M. Teboulle, A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems, SIAM Journal on Imaging Sciences, vol.2, issue.1, pp.183-202, 2009.
DOI : 10.1137/080716542

L. Birgé, An alternative point of view on Lepski's method. Lecture Notes-Monograph Series, pp.113-133, 2001.

L. Bottou and O. Bousquet, The tradeoffs of large scale learning, Advances in Neural Information Processing Systems, 2008.

A. Caponnetto and E. Vito, Optimal Rates for the Regularized Least-Squares Algorithm, Foundations of Computational Mathematics, vol.7, issue.3, pp.331-368, 2007.
DOI : 10.1007/s10208-006-0196-8

A. Cotter, O. Shamir, N. Srebro, and K. Sridharan, Better mini-batch algorithms via accelerated gradient methods, Advances in Neural Information Processing Systems, 2011.

F. Cucker and S. Smale, Best Choices for Regularization Parameters in Learning Theory: On the Bias???Variance Problem, Foundations of Computational Mathematics, vol.2, issue.4, pp.413-418, 2002.
DOI : 10.1007/s102080010030

E. D. Vito, A. Caponetto, and L. Rosasco, Model Selection for Regularized Least-Squares Algorithm in Learning Theory, Foundations of Computational Mathematics, vol.5, issue.1, pp.59-85, 2005.
DOI : 10.1007/s10208-004-0134-1

A. Défossez and F. Bach, Averaged least-mean-squares: bias-variance trade-offs and optimal sampling distributions, Proceedings of the International Conference on Artificial Intelligence and Statistics, p.2015

O. Dekel, R. Gilad-bachrach, O. Shamir, and L. Xiao, Optimal distributed online prediction using mini-batches, J. Mach. Learn. Res, vol.13, issue.1, pp.165-202, 2012.

O. Devolder, F. Glineur, and Y. Nesterov, First-order methods of smooth convex optimization with inexact oracle, Mathematical Programming, vol.110, issue.3, pp.37-75, 2014.
DOI : 10.1007/s10107-013-0677-5

A. Dieuleveut and F. Bach, Non-parametric stochastic approximation with large step sizes, Annals of Statistics, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01053831

M. Duflo, Random Iterative Models, 1997.
DOI : 10.1007/978-3-662-12880-0

H. W. Engl, M. Hanke, and A. Neubauer, Regularization of inverse problems, 1996.

N. Flammarion and F. Bach, From averaging to acceleration, there is only a step-size, Proceedings of the International Conference on Learning Theory (COLT), 2015.
URL : https://hal.archives-ouvertes.fr/hal-01136945

C. Gu, Smoothing Spline ANOVA Models, 2013.

L. Györfi and H. Walk, On the Averaged Stochastic Approximation for Linear Regression, SIAM Journal on Control and Optimization, vol.34, issue.1, pp.31-61, 1996.
DOI : 10.1137/S0363012992226661

L. Györfi, M. Kohler, A. Krzyzak, and H. Walk, A distribution-free theory of nonparametric regression, 2006.
DOI : 10.1007/b97848

T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, 2009.

D. Hsu, S. M. Kakade, and T. Zhang, Random Design Analysis of Ridge Regression, Foundations of Computational Mathematics, vol.17, issue.36, pp.569-600, 2014.
DOI : 10.1007/s10208-014-9192-1

H. Kushner and G. G. Yin, Stochastic approximation and Recursive Algorithms and Applications, 2003.

G. Lan, An optimal method for stochastic composite optimization, Mathematical Programming, vol.24, issue.1-2, pp.365-397, 2012.
DOI : 10.1007/s10107-010-0434-y

P. Massart, Concentration Inequalities and Model Selection, Lecture Notes in Mathematics, 2007.

P. Mccullagh and J. A. Nelder, Generalized Linear Models. Monographs on Statistics and Applied Probability, 1989.

A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, Robust Stochastic Approximation Approach to Stochastic Programming, SIAM Journal on Optimization, vol.19, issue.4, pp.1574-1609, 2009.
DOI : 10.1137/070704277

URL : https://hal.archives-ouvertes.fr/hal-00976649

Y. Nesterov, A method of solving a convex programming problem with convergence rate, Soviet Mathematics Doklady, vol.27, issue.1 22, pp.372-376, 1983.

Y. Nesterov, Introductory Lectures on Convex Optimization, of Applied Optimization, 2004.
DOI : 10.1007/978-1-4419-8853-9

B. O. Donoghue and E. Candès, Adaptive restart for accelerated gradient schemes, Foundations of Computational Mathematics, pp.1-18, 2013.

B. T. Polyak, Some methods of speeding up the convergence of iteration methods, USSR Computational Mathematics and Mathematical Physics, vol.4, issue.5, pp.1-17, 1964.
DOI : 10.1016/0041-5553(64)90137-5

B. T. Polyak, Introduction to Optimization. Translations Series in Mathematics and Engineering, 1987.

B. T. Polyak and A. B. Juditsky, Acceleration of Stochastic Approximation by Averaging, SIAM Journal on Control and Optimization, vol.30, issue.4, pp.838-855, 1992.
DOI : 10.1137/0330046

H. Robbins and S. Monro, A stochastic approxiation method. The Annals of mathematical, Statistics, vol.22, issue.3, pp.400-407, 1951.

A. Rudi, R. Camoriano, and L. Rosasco, Less is More: Nyström Computational Regularization, Advances in Neural Information Processing Systems 28, p.2015

B. Schölkopf and A. J. Smola, Learning with Kernels, 2002.

S. Shalev-shwartz, O. Shamir, N. Srebro, and K. Sridharan, Stochastic convex optimization, Proceedings of the International Conference on Learning Theory (COLT), 2009.

I. Steinwart and A. Christmann, Support Vector Machines, Series in Information Science and Statistics, 2008.

P. Tarrès and Y. Yao, Online learning as stochastic approximation of regularization paths, EEE Transactions in Information Theory, issue.99, pp.5716-5735, 2011.

A. B. Tsybakov, Optimal Rates of Aggregation, Proceedings of the Annual Conference on Computational Learning Theory, 2003.
DOI : 10.1007/978-3-540-45167-9_23

URL : https://hal.archives-ouvertes.fr/hal-00104867

A. B. Tsybakov, Introduction to Nonparametric Estimation, 2008.
DOI : 10.1007/b13794

L. Xiao, Dual averaging methods for regularized stochastic learning and online optimization, J. Mach. Learn. Res, vol.11, pp.2543-2596, 2010.

Y. Yao, L. Rosasco, and A. Caponnetto, On Early Stopping in Gradient Descent Learning, Constructive Approximation, vol.26, issue.2, pp.289-315, 2007.
DOI : 10.1007/s00365-006-0663-2

Y. Ying and M. Pontil, Online Gradient Descent Learning Algorithms, Foundations of Computational Mathematics, vol.8, issue.5, 2008.
DOI : 10.1007/s10208-006-0237-y

T. Zhang, Solving large scale linear prediction problems using stochastic gradient descent algorithms, Twenty-first international conference on Machine learning , ICML '04, 2004.
DOI : 10.1145/1015330.1015332