Exploration???exploitation tradeoff using variance estimates in multi-armed bandits, Theoretical Computer Science, vol.410, issue.19, pp.4101876-1902, 2009. ,
DOI : 10.1016/j.tcs.2009.01.016
URL : https://hal.archives-ouvertes.fr/hal-00711069
Finite-time analysis of the multiarmed bandit problem, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002. ,
DOI : 10.1023/A:1013689704352
Characterizing truthful multiarmed bandit mechanisms: extended abstract, Proceedings of the tenth ACM conference on Electronic commerce, pp.79-88, 2009. ,
DOI : 10.1145/1566374.1566386
URL : http://arxiv.org/abs/0812.2291
Bandit Problems, The New Palgrave Dictionary of Economics, 2008. ,
DOI : 10.1057/978-1-349-95121-5_2386-1
Online optimization in X-armed bandits, Advances in Neural Information Processing Systems 21, pp.201-208, 2009. ,
URL : https://hal.archives-ouvertes.fr/inria-00329797
Optimal Adaptive Policies for Sequential Allocation Problems, Advances in Applied Mathematics, vol.17, issue.2, pp.122-142, 1996. ,
DOI : 10.1006/aama.1996.0007
Bandit algorithms for tree search, In Uncertainty in Artificial Intelligence, 2007. ,
URL : https://hal.archives-ouvertes.fr/inria-00150207
The price of truthfulness for pay-perclick auctions, Proceedings of the tenth ACM conference on Electronic commerce, pp.99-106, 2009. ,
The kl-ucb algorithm for bounded stochastic bandits and beyond. Arxiv preprint arXiv:1102, 2011. ,
Exploration exploitation in go: UCT for Monte- Carlo go, Online trading between exploration and exploitation Workshop Twentieth Annual Conference on Neural Information Processing Systems, 2006. ,
URL : https://hal.archives-ouvertes.fr/hal-00115330
Adaptation in natural and artificial systems, 1992. ,
An asymptotically optimal bandit algorithm for bounded support models, Proceedings of the Twenty-Third Annual Conference on Learning Theory (COLT), 2010. ,
Multi-armed bandits in metric spaces, Proceedings of the fourtieth annual ACM symposium on Theory of computing, STOC 08, pp.681-690, 2008. ,
DOI : 10.1145/1374376.1374475
Nearly tight bounds for the continuum-armed bandit problem, Advances in Neural Information Processing Systems 17, pp.697-704, 2005. ,
Bandit Based Monte-Carlo Planning, Proceedings of the 17th European Conference on Machine Learning (ECML- 2006), pp.282-293, 2006. ,
DOI : 10.1007/11871842_29
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.102.1296
Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, vol.6, issue.1, pp.4-22, 1985. ,
DOI : 10.1016/0196-8858(85)90002-8
URL : http://doi.org/10.1016/0196-8858(85)90002-8
When can the two-armed bandit algorithm be trusted?, Annals of Applied Probability, vol.14, issue.3, pp.1424-1454, 2004. ,
URL : https://hal.archives-ouvertes.fr/hal-00102253
A finite-time analysis of multiarmed bandits problems with kullback-leibler divergences Arxiv preprint arXiv:1105, 2011. ,
The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality . The Annals of Probability, pp.1269-1283, 1990. ,
Some aspects of the sequential design of experiments, Bulletin of the American Mathematical Society, vol.58, issue.5, pp.527-535, 1952. ,
DOI : 10.1090/S0002-9904-1952-09620-8
Real and complex analysis (3rd), 1986. ,
Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998. ,
DOI : 10.1109/TNN.1998.712192