Best Arm Identification in Multi-armed Bandits, Proceedings of the 23rd Conference on Learning Theory, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-00654404
Finite-time analysis of the multiarmed bandit problem, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002. ,
DOI : 10.1023/A:1013689704352
Heads-up limit hold'em poker is solved, Science, vol.347, issue.6218, pp.145-149, 2015. ,
DOI : 10.1126/science.1259433
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.697.72
A Survey of Monte Carlo Tree Search Methods, IEEE Transactions on Computational Intelligence and AI in Games, vol.4, issue.1, pp.1-49, 2012. ,
DOI : 10.1109/TCIAIG.2012.2186810
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Machine Learning, pp.1-122, 2012. ,
DOI : 10.1561/2200000024
Pure exploration in finitely-armed and continuous-armed bandits, Theoretical Computer Science, vol.412, issue.19, pp.1832-18521832, 2011. ,
DOI : 10.1016/j.tcs.2010.12.059
URL : https://hal.archives-ouvertes.fr/hal-00609550
Kullback???Leibler upper confidence bounds for optimal sequential allocation, The Annals of Statistics, vol.41, issue.3, pp.1516-1541, 2013. ,
DOI : 10.1214/13-AOS1119SUPP
Action Elimination and Stopping Conditions for the Multi- Armed Bandit and Reinforcement Learning Problems, Journal of Machine Learning Research, vol.7, pp.1079-1105, 2006. ,
Competitive Markov Decision Processes, 1996. ,
DOI : 10.1007/978-1-4612-4054-9
Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence, Advances in Neural Information Processing Systems, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00747005
Optimal best arm identification with fixed confidence, Proceedings of the 29th Conference On Learning Theory, p.2016 ,
URL : https://hal.archives-ouvertes.fr/hal-01273838
The grand challenge of computer Go, Communications of the ACM, vol.55, issue.3, pp.106-113, 2012. ,
DOI : 10.1145/2093548.2093574
URL : https://hal.archives-ouvertes.fr/hal-00695370
UCB: an Optimal Exploration Algorithm for Multi-Armed Bandits, Proceedings of the 27th Conference on Learning Theory, 2014. ,
PAC subset selection in stochastic multi-armed bandits, International Conference on Machine Learning (ICML), 2012. ,
Information complexity in bandit subset selection, Proceeding of the 26th Conference On Learning Theory, 2013. ,
On the Complexity of Best Arm Identification in Multi-Armed Bandit Models, Journal of Machine Learning Research, p.2015 ,
URL : https://hal.archives-ouvertes.fr/hal-01024894
Bandit Based Monte-Carlo Planning, Proceedings of the 17th European Conference on Machine Learning, ECML'06, pp.282-293, 2006. ,
DOI : 10.1007/11871842_29
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.102.1296
The Racing Algorithm: Model Selection for Lazy Learners, Artificial Intelligence Review, vol.11, issue.1-5, pp.113-131, 1997. ,
DOI : 10.1007/978-94-017-2053-3_8
From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning, Machine Learning, 2014. ,
DOI : 10.1561/2200000038
URL : https://hal.archives-ouvertes.fr/hal-00747575
Mastering the game of Go with deep neural networks and tree search, Nature, vol.34, issue.7587, pp.484-489, 2016. ,
DOI : 10.1038/nature16961
Optimistic planning in markov decision processes using a generative model, Advances in Neural Information Processing Systems, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01079366