P. Auer, N. Cesa-bianchi, and P. Fischer, Finite-time analysis of the multiarmed bandit problem, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002.
DOI : 10.1023/A:1013689704352

P. Auer, N. Cesa-bianchi, Y. Freund, and R. Schapire, The Nonstochastic Multiarmed Bandit Problem, SIAM Journal on Computing, vol.32, issue.1, pp.48-77, 2002.
DOI : 10.1137/S0097539701398375

P. Billingsley, Convergence of Probability Measures, 1968.
DOI : 10.1002/9780470316962

S. Bubeck, R. Munos, G. Stoltz, and C. Szepesvári, Online optimization in X ?armed bandits, Proceedings of the 23rd Advances on Neural Information Processing Systems (NIPS), pp.201-208, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00329797

P. Coquelin and R. Munos, Bandit algorithms for tree search, Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI), pp.67-74, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00150207

L. Devroye and G. Lugosi, Combinatorial Methods in Density Estimation, 2001.
DOI : 10.1007/978-1-4613-0125-7

E. Even-dar, S. Mannor, and Y. Mansour, PAC Bounds for Multi-armed Bandit and Markov Decision Processes, Proceedings of the 15th Annual Conference on Computational Learning Theory (COLT), pp.255-270, 2002.
DOI : 10.1007/3-540-45435-7_18

S. Gelly, Y. Wang, R. Munos, and O. Teytaud, Modification of UCT with patterns in Monte-Carlo go, 2006.
URL : https://hal.archives-ouvertes.fr/inria-00117266

W. Hoeffding, Probability Inequalities for Sums of Bounded Random Variables, Journal of the American Statistical Association, vol.1, issue.301, pp.13-30, 1963.
DOI : 10.1214/aoms/1177730491

R. Kleinberg, Nearly tight bounds for the continuum-armed bandit problem, Proceedings of the 18th Advances on Neural Information Processing Systems (NIPS), pp.697-704, 2004.

R. Kleinberg and A. Slivkins, Sharp Dichotomies for Regret Minimization in Metric Spaces, Proceedings of the ACM?SIAM Symposium on Discrete Algorithms (SODA), pp.827-846, 2010.
DOI : 10.1137/1.9781611973075.68

L. Kocsis and C. Szepesvari, Bandit Based Monte-Carlo Planning, Proceedings of the 15th European Conference on Machine Learning (ECML), pp.282-293, 2006.
DOI : 10.1007/11871842_29

T. L. Lai and H. Robbins, Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, vol.6, issue.1, pp.4-22, 1985.
DOI : 10.1016/0196-8858(85)90002-8

O. Madani, D. Lizotte, and R. Greiner, The budgeted multi-armed bandit problem Open problems session, Proceedings of the 17th Annual Conference on Computational Learning Theory (COLT), pp.643-645, 2004.

S. Mannor and J. N. Tsitsiklis, The sample complexity of exploration in the multi-armed bandit problem, Journal of Machine Learning Research, vol.5, pp.623-648, 2004.

C. Mcdiarmid, On the method of bounded differences, Surveys in Combinatorics, pp.148-188, 1989.
DOI : 10.1017/CBO9781107359949.008

H. Robbins, Some aspects of the sequential design of experiments, Bulletin of the American Mathematical Society, vol.58, issue.5, pp.527-535, 1952.
DOI : 10.1090/S0002-9904-1952-09620-8

K. Schlag, Eleven tests needed for a recommendation, 2006.