G. Wahba, Spline Models for observationnal data, SIAM, 1990.

B. Schölkopf and A. J. Smola, Learning with Kernels, 2002.

J. Shawe-taylor and N. Cristianini, Kernel Methods for Pattern Analysis, 2004.
DOI : 10.1017/CBO9780511809682

A. B. Tsybakov, Introduction to Nonparametric Estimation, 2008.
DOI : 10.1007/b13794

S. Smale and F. Cucker, On the mathematical foundations of learning, Bulletin of the American Mathematical Society, vol.39, issue.1, pp.1-49, 2001.

F. Cucker and S. Smale, Best Choices for Regularization Parameters in Learning Theory: On the Bias???Variance Problem, Foundations of Computational Mathematics, vol.2, issue.4, pp.413-418, 2002.
DOI : 10.1007/s102080010030

E. D. Vito, A. Caponetto, and L. Rosasco, Model Selection for Regularized Least-Squares Algorithm in Learning Theory, Foundations of Computational Mathematics, vol.5, issue.1, pp.59-85, 2005.
DOI : 10.1007/s10208-004-0134-1

S. Smale and D. Zhou, Learning Theory Estimates via Integral Operators and Their Approximations, Constructive Approximation, vol.26, issue.2, pp.153-172, 2007.
DOI : 10.1007/s00365-006-0659-y

A. Caponnetto and E. Vito, Optimal Rates for the Regularized Least-Squares Algorithm, Foundations of Computational Mathematics, vol.7, issue.3, pp.331-368, 2007.
DOI : 10.1007/s10208-006-0196-8

I. Steinwart, D. Hush, and C. Scovel, Optimal rates for regularized least squares regression, Proceedings of the 22nd Annual Conference in Learning theory, 2009.

F. Bach, Sharp analysis of low-rank kernel matrix approximations, Proceedings of the International Conference on Learning Theory (COLT), 2012.
URL : https://hal.archives-ouvertes.fr/hal-00723365

G. Blanchard and N. Krämer, Optimal learning rates for kernel conjugate gradient regression Advances in NEural Inf, Proc. Systems (NIPS), pp.226-234, 2010.

G. Raskutti, W. M. , and Y. B. , Early stopping for non-parametric regression: An optimal data-dependent stopping rule, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton), 2011.
DOI : 10.1109/Allerton.2011.6120320

T. Zhang and B. Yu, Boosting with early stopping: Convergence and consistency, The Annals of Statistics, vol.33, issue.4, pp.1538-1579, 2005.
DOI : 10.1214/009053605000000255

L. Rosasco, A. Tacchetti, and S. Villa, Regularization by Early Stopping for Online Learning Algorithms ArXiv e-prints, 2014.

Y. Ying and M. Pontil, Online Gradient Descent Learning Algorithms, Foundations of Computational Mathematics, vol.8, issue.5, 2008.
DOI : 10.1007/s10208-006-0237-y

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.98.8413

P. Tarrès and Y. Yao, Online learning as stochastic approximation of regularization paths ArXiv e-prints 1103, 2011.

H. Robbins and S. Monro, A Stochastic Approximation Method, The Annals of Mathematical Statistics, vol.22, issue.3, pp.400-407, 1951.
DOI : 10.1214/aoms/1177729586

S. Shalev-shwartz, Online Learning and Online Convex Optimization, Machine Learning, pp.107-194, 2011.
DOI : 10.1561/2200000018

F. Bach and E. Moulines, Non-asymptotic analysis of stochastic approximation algorithms for machine learning, Adv. NIPS, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00608041

F. Bach and E. Moulines, Non-strongly-convex smooth stochastic approximation with convergence rate, Advances in Neural Information Processing Systems (NIPS), 2013.
URL : https://hal.archives-ouvertes.fr/hal-00831977

B. Thomson, J. Bruckner, and A. M. Bruckner, Elementary real analysis, 2000.

P. Mikusinski and E. Weiss, The Bochner Integral ArXiv e-prints, 2014.

N. Aronszajn, Theory of reproducing kernels, Transactions of the American Mathematical Society, vol.68, issue.3, pp.337-404, 1950.
DOI : 10.1090/S0002-9947-1950-0051437-7

H. Brezis, Analyse fonctionnelle, Théorie et applications, 1983.

C. A. Micchelli, Y. Xu, and H. Zhang, Universal kernels, The Journal of Machine Learning Research, vol.7, pp.2651-2667, 2006.

B. K. Sriperumbudur, K. Fukumizu, and G. R. Lanckriet, Universality, characteristic kernels and rkhs embedding of measures, The Journal of Machine Learning Research, vol.12, pp.2389-2410, 2011.

D. Hsu, S. M. Kakade, and T. Zhang, Random Design Analysis of Ridge Regression, Foundations of Computational Mathematics, vol.17, issue.36, pp.569-600, 2014.
DOI : 10.1007/s10208-014-9192-1

Y. Yao, L. Rosasco, and A. Caponnetto, On Early Stopping in Gradient Descent Learning, Constructive Approximation, vol.26, issue.2, pp.289-315, 2007.
DOI : 10.1007/s00365-006-0663-2

G. Kimeldorf and G. Wahba, Some results on Tchebycheffian spline functions, Journal of Mathematical Analysis and Applications, vol.33, issue.1, pp.82-95, 1971.
DOI : 10.1016/0022-247X(71)90184-3

M. W. Mahoney, Randomized Algorithms for Matrices and Data, Machine Learning, pp.123-224, 2011.
DOI : 10.1201/b11822-37

C. Williams and M. Seeger, Using the Nyström method to speed up kernel machines, Adv. NIPS, 2001.

F. Bach, Sharp analysis of low-rank kernel matrix approximations, Proceedings of the International Conference on Learning Theory (COLT), 2013.
URL : https://hal.archives-ouvertes.fr/hal-00723365

O. Dekel, S. Shalev-shwartz, and Y. Singer, The Forgetron: A Kernel-Based Perceptron on a Budget, Adv. NIPS, 2005.
DOI : 10.1137/060666998

A. Bordes, S. Ertekin, J. Weston, and L. Bottou, Fast kernel classifiers with online and active learning, Journal of Machine Learning Research, vol.6, pp.1579-1619, 2005.
URL : https://hal.archives-ouvertes.fr/hal-00752361

J. Kivinen, S. A. , and R. C. Williamson, Online Learning with Kernels, IEEE Transactions on Signal Processing, vol.52, issue.8, pp.2165-2176, 2004.
DOI : 10.1109/TSP.2004.830991

Y. Yao, A dynamic Theory of Learning, 2006.

T. Zhang, Solving large scale linear prediction problems using stochastic gradient descent algorithms, Twenty-first international conference on Machine learning , ICML '04, 2004.
DOI : 10.1145/1015330.1015332

E. Hazan and S. Kale, Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization, Proceedings of the International Conference on Learning Theory (COLT), 2011.

H. W. Engl, M. Hanke, and N. A. , Regularization of inverse problems, 1996.

S. Lacoste-julien, M. Schmidt, and F. Bach, A simpler approach to obtaining an O(1/t) rate for the stochastic projected subgradient method ArXiv e-prints 1212, 2002.

A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, Robust Stochastic Approximation Approach to Stochastic Programming, SIAM Journal on Optimization, vol.19, issue.4, pp.1574-1609, 2009.
DOI : 10.1137/070704277

URL : https://hal.archives-ouvertes.fr/hal-00976649

I. M. Johnstone, Minimax Bayes, Asymptotic Minimax and Sparse Wavelet Priors, Statistical Decision Theory and Related Topics, pp.303-326, 1994.
DOI : 10.1007/978-1-4612-2618-5_23

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.33.8956

M. Abramowitz and I. Stegun, Handbook of Mathematical Functions, American Journal of Physics, vol.34, issue.2, 1964.
DOI : 10.1119/1.1972842

N. Flammarion and F. Bach, From averaging to acceleration, there is only a stepsize, Proceedings of the International Conference on Learning Theory (COLT), 2015.
URL : https://hal.archives-ouvertes.fr/hal-01136945

A. N. Kolmogorov and S. V. Fomin, Elements of the theory of functions and functional analysis, 1999.

F. Paulin, Topologie, analyse et calcul différentiel Notes de cours, École Normale Supérieure, 2009.

H. Hochstadt, Integral equations SIERRA Project-Team 23, avenue d'Italie 75013 Paris, 1973.