J. Audibert, S. Bubeck, and R. Munos, Best arm identification in multi-armed bandits, Proceedings of the Twenty-Third Annual Conference on Learning Theory, pp.41-53, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00654404

J. Audibert, R. Munos, and C. Szepesvári, Tuning Bandit Algorithms in Stochastic Environments, Algorithmic Learning Theory, pp.150-165, 2007.
DOI : 10.1093/biomet/25.3-4.285
URL : https://hal.archives-ouvertes.fr/inria-00203487

P. Auer, N. Cesa-bianchi, and P. Fischer, Finite-time analysis of the multi-armed bandit problem, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002.
DOI : 10.1023/A:1013689704352

S. Bubeck, R. Munos, and G. Stoltz, Pure Exploration in Multi-armed Bandits Problems, Proceedings of the Twentieth International Conference on Algorithmic Learning Theory, pp.23-37, 2009.
DOI : 10.1090/S0002-9904-1952-09620-8

K. Deng, J. Pineau, and S. Murphy, Active learning for personalizing treatment, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), 2011.
DOI : 10.1109/ADPRL.2011.5967348

C. Dimitrakakis and M. Lagoudakis, Rollout sampling approximate policy iteration, Machine Learning, vol.4, issue.1, pp.157-171, 2008.
DOI : 10.1007/s10994-008-5069-3
URL : http://arxiv.org/abs/0805.2027

E. Even-dar, S. Mannor, and Y. Mansour, Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems, Journal of Machine Learning Research, vol.7, pp.1079-1105, 2006.

V. Gabillon, M. Ghavamzadeh, A. Lazaric, and S. Bubeck, Multi-bandit best arm identification, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00632523

M. Lagoudakis and R. Parr, Reinforcement learning as classification: Leveraging modern classifiers, Proceedings of the Twentieth International Conference on Machine Learning, pp.424-431, 2003.

O. Maron and A. Moore, Hoeffding races: Accelerating model selection search for classification and function approximation, Proceedings of Advances in Neural Information Processing Systems 6, 1993.

A. Maurer and M. Pontil, Empirical bernstein bounds and sample-variance penalization, 22th annual conference on learning theory, 2009.

V. Mnih, C. Szepesvári, and J. Audibert, Empirical Bernstein stopping, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.672-679, 2008.
DOI : 10.1145/1390156.1390241
URL : https://hal.archives-ouvertes.fr/hal-00834983

H. Robbins, Some aspects of the sequential design of experiments, Bulletin of the American Mathematical Society, vol.58, issue.5, pp.527-535, 1952.
DOI : 10.1090/S0002-9904-1952-09620-8