J. Audibert, S. Bubeck, and R. Munos, Best Arm Identification in Multi-armed Bandits, Proceedings of the 23rd Conference on Learning Theory, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00654404

P. Auer, N. Cesa-bianchi, and P. Fischer, Finite-time analysis of the multiarmed bandit problem, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002.
DOI : 10.1023/A:1013689704352

M. Bowling, N. Burch, M. Johanson, and O. Tammelin, Heads-up limit hold'em poker is solved, Science, vol.347, issue.6218, pp.145-149, 2015.
DOI : 10.1126/science.1259433
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.697.72

C. Browne, E. Powley, D. Whitehouse, S. Lucas, P. Cowling et al., A Survey of Monte Carlo Tree Search Methods, IEEE Transactions on Computational Intelligence and AI in Games, vol.4, issue.1, pp.1-49, 2012.
DOI : 10.1109/TCIAIG.2012.2186810

S. Bubeck and N. Cesa-bianchi, Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Machine Learning, pp.1-122, 2012.
DOI : 10.1561/2200000024

S. Bubeck, R. Munos, and G. Stoltz, Pure exploration in finitely-armed and continuous-armed bandits, Theoretical Computer Science, vol.412, issue.19, pp.1832-18521832, 2011.
DOI : 10.1016/j.tcs.2010.12.059
URL : https://hal.archives-ouvertes.fr/hal-00609550

O. Cappé, A. Garivier, O. Maillard, R. Munos, and G. Stoltz, Kullback???Leibler upper confidence bounds for optimal sequential allocation, The Annals of Statistics, vol.41, issue.3, pp.1516-1541, 2013.
DOI : 10.1214/13-AOS1119SUPP

E. Even-dar, S. Mannor, and Y. Mansour, Action Elimination and Stopping Conditions for the Multi- Armed Bandit and Reinforcement Learning Problems, Journal of Machine Learning Research, vol.7, pp.1079-1105, 2006.

J. Filar and K. Vrieze, Competitive Markov Decision Processes, 1996.
DOI : 10.1007/978-1-4612-4054-9

V. Gabillon, M. Ghavamzadeh, and A. Lazaric, Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence, Advances in Neural Information Processing Systems, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00747005

A. Garivier and E. Kaufmann, Optimal best arm identification with fixed confidence, Proceedings of the 29th Conference On Learning Theory, p.2016
URL : https://hal.archives-ouvertes.fr/hal-01273838

S. Gelly, L. Kocsis, M. Schoenauer, M. Sebag, D. Silver et al., The grand challenge of computer Go, Communications of the ACM, vol.55, issue.3, pp.106-113, 2012.
DOI : 10.1145/2093548.2093574
URL : https://hal.archives-ouvertes.fr/hal-00695370

K. Jamieson, M. Malloy, R. Nowak, and S. Bubeck, UCB: an Optimal Exploration Algorithm for Multi-Armed Bandits, Proceedings of the 27th Conference on Learning Theory, 2014.

S. Kalyanakrishnan, A. Tewari, P. Auer, and P. Stone, PAC subset selection in stochastic multi-armed bandits, International Conference on Machine Learning (ICML), 2012.

E. Kaufmann and S. Kalyanakrishnan, Information complexity in bandit subset selection, Proceeding of the 26th Conference On Learning Theory, 2013.

E. Kaufmann, O. Cappé, and A. Garivier, On the Complexity of Best Arm Identification in Multi-Armed Bandit Models, Journal of Machine Learning Research, p.2015
URL : https://hal.archives-ouvertes.fr/hal-01024894

L. Kocsis and C. Szepesvári, Bandit Based Monte-Carlo Planning, Proceedings of the 17th European Conference on Machine Learning, ECML'06, pp.282-293, 2006.
DOI : 10.1007/11871842_29
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.102.1296

O. Maron and A. Moore, The Racing Algorithm: Model Selection for Lazy Learners, Artificial Intelligence Review, vol.11, issue.1-5, pp.113-131, 1997.
DOI : 10.1007/978-94-017-2053-3_8

R. Munos, From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning, Machine Learning, 2014.
DOI : 10.1561/2200000038
URL : https://hal.archives-ouvertes.fr/hal-00747575

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre et al., Mastering the game of Go with deep neural networks and tree search, Nature, vol.34, issue.7587, pp.484-489, 2016.
DOI : 10.1038/nature16961

B. Szorenyi, G. Kedenburg, and R. Munos, Optimistic planning in markov decision processes using a generative model, Advances in Neural Information Processing Systems, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01079366