D. Abbasi-yadkori, C. Pál, and . Szepesvári, Improved Algorithms for Linear Stochastic Bandits, Advances in Neural Information Processing Systems, 2011.

S. Agrawal and N. Goyal, Thompson Sampling for Contextual Bandits with Linear Payoffs, International Conference on Machine Learning (ICML), 2013.

A. Antos, V. Grover, and C. Szepesvári, Active Learning in Multi-armed Bandits, Algorithmic Learning Theory, 2008.
DOI : 10.1007/978-3-540-87987-9_25

A. Barron, J. Rissanen, and B. Yu, The minimum description length principle in coding and modeling. Information Theory, IEEE Transactions on, vol.44, issue.6, pp.2743-2760, 1998.

S. Bubeck and N. Cesa-bianchi, Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Machine Learning, pp.1-122, 2012.
DOI : 10.1561/2200000024

S. Bubeck, R. Munos, G. Stoltz, and C. Szepesvári, X-armed bandits, Journal of Machine Learning Research, vol.12, pp.1587-1627, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00450235

O. Cappé, A. Garivier, O. Maillard, R. Munos, and G. Stoltz, Kullback???Leibler upper confidence bounds for optimal sequential allocation, The Annals of Statistics, vol.41, issue.3, pp.1516-1541, 2013.
DOI : 10.1214/13-AOS1119SUPP

A. Chambaz, A. Garivier, and E. Gassiat, A minimum description length approach to hidden Markov models with Poisson and Gaussian emissions. Application to order identification, Journal of Statistical Planning and Inference, vol.139, issue.3, pp.962-977, 2009.
DOI : 10.1016/j.jspi.2008.06.010

H. Chernoff, Sequential Design of Experiments, The Annals of Mathematical Statistics, vol.30, issue.3, pp.755-770, 1959.
DOI : 10.1214/aoms/1177706205

R. Combes and A. Proutì-ere, Unimodal Bandits without Smoothness, 2014.

E. Even-dar, S. Mannor, and Y. Mansour, Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems, Journal of Machine Learning Research, vol.7, pp.1079-1105, 2006.

V. Gabillon, M. Ghavamzadeh, and A. Lazaric, Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence, Advances in Neural Information Processing Systems, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00747005

A. Garivier, Consistency of the Unlimited BIC Context Tree Estimator, IEEE Transactions on Information Theory, vol.52, issue.10, pp.4630-4635, 2006.
DOI : 10.1109/TIT.2006.881742

T. L. Graves and T. L. Lai, Asymptotically Efficient Adaptive Choice of Control Laws inControlled Markov Chains, SIAM Journal on Control and Optimization, vol.35, issue.3, pp.715-743, 1997.
DOI : 10.1137/S0363012994275440

D. Peter and . Grünwald, The Minimum Description Length Principle (Adaptive Computation and Machine Learning), 2007.

K. Jamieson, M. Malloy, R. Nowak, and S. Bubeck, UCB: an Optimal Exploration Algorithm for Multi-Armed Bandits, Proceedings of the 27th Conference on Learning Theory, 2014.

S. Kalyanakrishnan, A. Tewari, P. Auer, and P. Stone, PAC subset selection in stochastic multiarmed bandits, International Conference on Machine Learning (ICML), 2012.

E. Kaufmann and S. Kalyanakrishnan, Information complexity in bandit subset selection, Proceeding of the 26th Conference On Learning Theory, 2013.

E. Kaufmann, O. Cappé, and A. Garivier, On the Complexity of A/B Testing, Proceedings of the 27th Conference On Learning Theory, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00990254

E. Kaufmann, O. Cappé, and A. Garivier, On the Complexity of Best Arm Identification in Multi- Armed Bandit Models, Journal of Machine Learning Research, p.2015
URL : https://hal.archives-ouvertes.fr/hal-01024894

E. Raphail, V. K. Krichevsky, and . Trofimov, The performance of universal encoding, IEEE Transactions on Information Theory, vol.27, issue.2, pp.199-206, 1981.

T. L. Lai and H. Robbins, Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, vol.6, issue.1, pp.4-22, 1985.
DOI : 10.1016/0196-8858(85)90002-8

S. Magureanu, R. Combes, and A. Proutì-ere, Lipschitz Bandits: Regret lower bounds and optimal algorithms, Proceedings on the 27th Conference On Learning Theory, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01092791

S. Mannor and J. Tsitsiklis, The Sample Complexity of Exploration in the Multi-Armed Bandit Problem, Journal of Machine Learning Research, pp.623-648, 2004.

R. Munos, From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning, Machine Learning, 2014.
DOI : 10.1561/2200000038

URL : https://hal.archives-ouvertes.fr/hal-00747575

J. Rissanen, Modeling by shortest data description, Automatica, vol.14, issue.5, pp.465-471, 1978.
DOI : 10.1016/0005-1098(78)90005-5

N. Srinivas, A. Krause, S. Kakade, and M. Seeger, Gaussian Process Optimization in the Bandit Setting : No Regret and Experimental Design, Proceedings of the International Conference on Machine Learning, 2010.

N. K. Vaidhyan and R. Sundaresan, Learning to detect an oddball target, 2015.

M. J. Frans, Y. M. Willems, T. J. Shtarkov, and . Tjalkens, The context tree weighting method: Basic properties, IEEE Transactions on Information Theory, vol.41, pp.653-664, 1995.