. Akaike, Information theory and an extension of the maximum likelihood principle, 2nd International Symposium on Information Theory, pp.267-281, 1973.

P. Alquier, PAC-Bayesian bounds for randomized empirical risk minimizers, Mathematical Methods of Statistics, vol.17, issue.4, pp.279-304, 2008.
DOI : 10.3103/S1066530708040017
URL : https://hal.archives-ouvertes.fr/hal-00354922

P. Alquier and K. Lounici, PAC-Bayesian bounds for sparse regression estimation with exponential weights, Electronic Journal of Statistics, vol.5, issue.0, pp.127-145, 2011.
DOI : 10.1214/11-EJS601
URL : https://hal.archives-ouvertes.fr/hal-00465801

P. Alquier and O. Wintenberger, Model selection for weakly dependent time series forecasting . Bernoulli (to appear), available on arXiv:0902, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00362151

K. B. Athreya and S. G. Pantula, Mixing properties of harris chains and autoregressive processes, Journal of Applied Probability, vol.23, issue.04, pp.880-892, 1986.
DOI : 10.1007/BF01025869

J. Audibert, Théorie statistique de l'apprentissage: une approche pac-bayésienne, 2004.

J. Audibert, Pac-bayesian aggregation and multi-armed bandits, 2010.
URL : https://hal.archives-ouvertes.fr/tel-00536084

P. Brockwell and R. Davis, Time Series: Theory and Methods, 2009.

P. Bühlmann and S. Van-de-geer, Statistics for High-Dimensional Data, 2011.
DOI : 10.1007/978-3-642-20192-9

O. Catoni, A pac-bayesian approach to adaptative classification, Preprint Laboratoire de Probabilités et Modèles Aléatoires, 2003.

O. Catoni, Statistical Learning Theory and Stochastic Optimization, Lecture Notes in Mathematics, vol.1851, 2001.
DOI : 10.1007/b99352
URL : https://hal.archives-ouvertes.fr/hal-00104952

O. Catoni, PAC-Bayesian Supervised Classification (The Thermodynamics of Statistical Learning, Lecture Notes-Monograph Series. IMS, vol.56, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00206119

A. Dalalyan and A. Tsybakov, Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity, Machine Learning, pp.39-61, 2008.
DOI : 10.1007/s10994-008-5051-0
URL : https://hal.archives-ouvertes.fr/hal-00291504

J. Dedecker, P. Doukhan, G. Lang, J. R. León, S. Louhichi et al., Weak Dependence , Examples and Applications, Lecture Notes in Statistics, vol.190, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00686031

M. D. Donsker and S. S. Varadhan, Asymptotic evaluation of certain Markov process expectations for large time???III, Communications on Pure and Applied Mathematics, vol.19, issue.4, pp.389-461, 1976.
DOI : 10.1002/cpa.3160290405

P. Doukhan, Mixing, volume 85 of Lecture Notes in Statistics, 1994.

S. Gerchinovitz, Sparsity regret bounds for individual sequences in online linear regression, Proceedings of COLT'11, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00552267

P. J. Green, Reversible jump Markov chain Monte Carlo computation and Bayesian model determination, Biometrika, vol.82, issue.4, pp.711-732, 1995.
DOI : 10.1093/biomet/82.4.711

J. Hamilton, Time Series Analysis, 1994.

I. A. Ibragimov, Some limit theorems for stationary processes. Theory of Probability and its Application, pp.349-382, 1962.

N. Littlestone and M. K. Warmuth, The Weighted Majority Algorithm, Information and Computation, vol.108, issue.2, pp.212-261, 1994.
DOI : 10.1006/inco.1994.1009

J. Marin and C. P. Robert, Bayesian Core: A practical approach to computational Bayesian analysis, 2007.

D. A. Mcallester, PAC-Bayesian model averaging, Proceedings of the twelfth annual conference on Computational learning theory , COLT '99, pp.164-170
DOI : 10.1145/307400.307435

R. Meir, Nonparametric time series prediction through adaptive model selection, Machine Learning, pp.5-34, 2000.

S. P. Meyn and R. L. Tweedie, Markov chains and stochastic stability. Communications and Control Engineering Series, 1993.

D. S. Modha and E. Masry, Memory-universal prediction of stationary random processes, IEEE Transactions on Information Theory, vol.44, issue.1, pp.117-133, 1998.
DOI : 10.1109/18.650998

R. Development and C. Team, R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, 2008.

C. P. Robert, Méthods de Monte Carlo par chaines de Markov, Economica, 1996.

P. Samson, Concentration of measure inequalities for markov chains and ?-mixing processes . The Annals of Probability, pp.416-461, 2000.

Y. Seldin, F. Laviolette, N. Cesa-bianchi, P. Auer, and J. Shawe-taylor, PAC-Bayesian Inequalities for Martingales, IEEE Transactions on Information Theory, vol.58, issue.12, 2011.
DOI : 10.1109/TIT.2012.2211334

J. Shawe-taylor and R. Williamson, A pac analysis of a bayes estimator, Proceedings of the Tenth Annual Conference on Computational Learning Theory, COLT'97, pp.2-9, 1997.

G. Stoltz, Agrégation séquentielle de prédicteurs : méthodologie générale et applicationsàapplicationsà la prévision de la qualité de l'air etàetà celle de la consommationélectriqueconsommationélectrique, Journal de la SFDS, vol.151, issue.2, pp.66-106, 2010.

R. Tibshirani, Regression shrinkage and selection via the lasso, J. Roy. Statist. Soc. Ser. B, vol.58, issue.1, pp.267-288, 1996.

A. Tsybakov, Optimal Rates of Aggregation, Learning Theory and Kernel Machines, pp.303-313, 2003.
DOI : 10.1007/978-3-540-45167-9_23
URL : https://hal.archives-ouvertes.fr/hal-00104867

V. G. Vovk, AGGREGATING STRATEGIES, Proceedings of the 3rd Annual Workshop on Computational Learning Theory (COLT), pp.372-283, 1990.
DOI : 10.1016/B978-1-55860-146-8.50032-1

O. Wintenberger, Deviation inequalities for sums of weakly dependent time series, Electronic Communications in Probability, vol.15, pp.489-503, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00430608