J. Audibert, R. Munos, and C. Szepesvári, Exploration???exploitation tradeoff using variance estimates in multi-armed bandits, Theoretical Computer Science, vol.410, issue.19, pp.4101876-1902, 2009.
DOI : 10.1016/j.tcs.2009.01.016
URL : https://hal.archives-ouvertes.fr/hal-00711069

P. Auer, N. Cesa-bianchi, and P. Fischer, Finite-time analysis of the multiarmed bandit problem, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002.
DOI : 10.1023/A:1013689704352

M. Babaioff, Y. Sharma, and A. Slivkins, Characterizing truthful multiarmed bandit mechanisms: extended abstract, Proceedings of the tenth ACM conference on Electronic commerce, pp.79-88, 2009.
DOI : 10.1145/1566374.1566386
URL : http://arxiv.org/abs/0812.2291

D. Bergemann and J. Valimaki, Bandit Problems, The New Palgrave Dictionary of Economics, 2008.
DOI : 10.1057/978-1-349-95121-5_2386-1

S. Bubeck, R. Munos, G. Stoltz, and C. Szepesvari, Online optimization in X-armed bandits, Advances in Neural Information Processing Systems 21, pp.201-208, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00329797

A. N. Burnetas and M. N. Katehakis, Optimal Adaptive Policies for Sequential Allocation Problems, Advances in Applied Mathematics, vol.17, issue.2, pp.122-142, 1996.
DOI : 10.1006/aama.1996.0007

P. A. Coquelin and R. Munos, Bandit algorithms for tree search, In Uncertainty in Artificial Intelligence, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00150207

N. R. Devanur and S. M. Kakade, The price of truthfulness for pay-perclick auctions, Proceedings of the tenth ACM conference on Electronic commerce, pp.99-106, 2009.

A. Garivier and O. Cappé, The kl-ucb algorithm for bounded stochastic bandits and beyond. Arxiv preprint arXiv:1102, 2011.

S. Gelly and Y. Wang, Exploration exploitation in go: UCT for Monte- Carlo go, Online trading between exploration and exploitation Workshop Twentieth Annual Conference on Neural Information Processing Systems, 2006.
URL : https://hal.archives-ouvertes.fr/hal-00115330

J. H. Holland, Adaptation in natural and artificial systems, 1992.

J. Honda and A. Takemura, An asymptotically optimal bandit algorithm for bounded support models, Proceedings of the Twenty-Third Annual Conference on Learning Theory (COLT), 2010.

R. Kleinberg, A. Slivkins, and E. Upfal, Multi-armed bandits in metric spaces, Proceedings of the fourtieth annual ACM symposium on Theory of computing, STOC 08, pp.681-690, 2008.
DOI : 10.1145/1374376.1374475

R. D. Kleinberg, Nearly tight bounds for the continuum-armed bandit problem, Advances in Neural Information Processing Systems 17, pp.697-704, 2005.

L. Kocsis and C. Szepesvári, Bandit Based Monte-Carlo Planning, Proceedings of the 17th European Conference on Machine Learning (ECML- 2006), pp.282-293, 2006.
DOI : 10.1007/11871842_29
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.102.1296

T. L. Lai and H. Robbins, Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, vol.6, issue.1, pp.4-22, 1985.
DOI : 10.1016/0196-8858(85)90002-8
URL : http://doi.org/10.1016/0196-8858(85)90002-8

D. Lamberton, G. Pagès, and P. Tarrès, When can the two-armed bandit algorithm be trusted?, Annals of Applied Probability, vol.14, issue.3, pp.1424-1454, 2004.
URL : https://hal.archives-ouvertes.fr/hal-00102253

O. A. Maillard, R. Munos, and G. Stoltz, A finite-time analysis of multiarmed bandits problems with kullback-leibler divergences Arxiv preprint arXiv:1105, 2011.

P. Massart, The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality . The Annals of Probability, pp.1269-1283, 1990.

H. Robbins, Some aspects of the sequential design of experiments, Bulletin of the American Mathematical Society, vol.58, issue.5, pp.527-535, 1952.
DOI : 10.1090/S0002-9904-1952-09620-8

W. Rudin, Real and complex analysis (3rd), 1986.

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192