H. Akaike, Information theory and an extension of the maximum likelihood principle, Second International Symposium on Information Theory (Tsahkadsor, pp.267-281, 1971.

P. Alquier, PAC-Bayesian bounds for randomized empirical risk minimizers, Mathematical Methods of Statistics, vol.17, issue.4, pp.279-304, 2008.
DOI : 10.3103/S1066530708040017

URL : https://hal.archives-ouvertes.fr/hal-00195698

D. W. Andrews, Non-strong mixing autoregressive processes, Journal of Applied Probability, vol.21, issue.04, pp.930-934, 1984.
DOI : 10.2307/3212764

J. Y. Audibert, Aggregated estimators and empirical complexity for least square regression, Annales de l?Institut Henri Poincare (B) Probability and Statistics, vol.40, issue.6, pp.685-736, 2004.
DOI : 10.1016/j.anihpb.2003.11.006

Y. Baraud, F. Comte, and G. Viennet, Adaptive estimation in autoregression or ?-mixing regression via model selection, Ann. Statist, vol.29, pp.839-875, 2001.

A. R. Barron, Approximation and estimation bounds for artificial neural networks, Machine Learning, vol.14, pp.115-133, 1994.

O. Catoni, A PAC-Bayesian approach to adaptative classification, 2003.

O. Catoni, Statistical Learning Theory and Stochastic Optimization Lecture Notes in Math, Lecture notes from the 31st Summer School on Probability Theory held in Saint-Flour, p.2163920, 1851.

O. Catoni, Pac-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning, Institute of Mathematical Statistics Lecture Notes ? Monograph Series 56. Beachwood, OH: IMS. MR2483528, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00206119

A. Dalalyan and A. Tsybakov, Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity, Machine Learning, vol.52, issue.1-2, pp.39-61, 2008.
DOI : 10.1007/s10994-008-5051-0

URL : https://hal.archives-ouvertes.fr/hal-00265651

J. Dedecker, P. Doukhan, G. Lang, R. León, J. R. Louhichi et al., Weak dependence, Lecture Notes in Statistics, p.2338725, 0190.
DOI : 10.1007/978-0-387-69952-3_2

URL : https://hal.archives-ouvertes.fr/hal-00686031

J. Dedecker and C. Prieur, New dependence coefficients. Examples and applications to statistics, Probability Theory and Related Fields, vol.95, issue.2, pp.203-236, 2005.
DOI : 10.1007/s00440-004-0394-3

P. Doukhan, Mixing: Properties and Examples, Lecture Notes in Statistics, vol.85, p.1312160, 1994.

P. Doukhan and O. Wintenberger, Weakly dependent chains with infinite memory. Stochastic Process, Appl, vol.118, pp.1997-2013, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00199890

S. Goldstein, Maximal coupling, Zeitschrift f???r Wahrscheinlichkeitstheorie und Verwandte Gebiete, vol.3, issue.2, pp.193-204, 1978.
DOI : 10.1007/BF00533259

I. Ibragimov, Some limit theorems for stationary processes. Theory Probab, Appl, vol.7, pp.349-382, 1962.

C. K. Ing, Accumulated prediction errors, information criteria and optimal forecasting for autoregressive time series, The Annals of Statistics, vol.35, issue.3, pp.1238-1277, 2007.
DOI : 10.1214/009053606000001550

C. Lacour, Nonparametric estimation of the stationary density and the transition density of a Markov chain, Stochastic Processes and their Applications, vol.118, issue.2, pp.232-260, 2008.
DOI : 10.1016/j.spa.2007.04.013

URL : https://hal.archives-ouvertes.fr/hal-01139399

P. Massart, Concentration Inequalities and Model Selection Lecture Notes in Math, Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour, p.2319879, 1896.

D. A. Mcallester, Some PAC-Bayesian theorems, Proceedings of the eleventh annual conference on Computational learning theory , COLT' 98, pp.230-234, 1998.
DOI : 10.1145/279943.279989

R. Meir, Nonparametric model selection through adaptive model selection, Machine Learning, vol.39, issue.1, pp.5-34, 2000.
DOI : 10.1023/A:1007602715810

D. S. Modha and E. Masry, Memory-universal prediction of stationary random processes, IEEE Transactions on Information Theory, vol.44, issue.1, pp.117-133, 1998.
DOI : 10.1109/18.650998

R. Development and C. Team, R: A Language and Environment for Statistical Computing . Vienna, Austria: R Foundation for Statistical Computing, 2008.

E. Rio, In??galit??s de Hoeffding pour les fonctions lipschitziennes de suites d??pendantes, Comptes Rendus de l'Acad??mie des Sciences - Series I - Mathematics, vol.330, issue.10, pp.905-908, 2000.
DOI : 10.1016/S0764-4442(00)00290-1

E. Rio, Théorie Asymptotique des Processus Aléatoires Faiblement Dépendants, ) [Mathematics & Applications] 31, 2000.

G. Schwarz, Estimating the Dimension of a Model, The Annals of Statistics, vol.6, issue.2, pp.461-464, 1978.
DOI : 10.1214/aos/1176344136

J. Shawe-taylor and R. Williamson, A pac analysis of a Bayes estimator, Proceedings of the Tenth Annual Conference on Computational Learning Theory, COLT'97, pp.2-9, 1997.

G. Stoltz, Informationincompì ete et regret interne en prédiction de suites individuelles, 2005.

V. N. Vapnik, The Nature of Statistical Learning Theory, 1995.