P. Auer, N. Cesa-bianchi, and P. Fischer, Finite-time analysis of the multiarmed bandit problem, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002.
DOI : 10.1023/A:1013689704352

P. Auer, N. Cesa-bianchi, Y. Freund, and R. E. Schapire, The Nonstochastic Multiarmed Bandit Problem, SIAM Journal on Computing, vol.32, issue.1, pp.48-77, 2002.
DOI : 10.1137/S0097539701398375

P. Auer and R. Ortner, UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem, Periodica Mathematica Hungarica, vol.5, issue.1-2, pp.55-65, 2010.
DOI : 10.1007/s10998-010-3055-6

S. Bubeck, Bandits games and clustering foundations, 2010.
URL : https://hal.archives-ouvertes.fr/tel-00845565

S. Bubeck and N. Cesa-bianchi, Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Foundations and Trends?? in Machine Learning, vol.5, issue.1, pp.1-122, 2012.
DOI : 10.1561/2200000024

S. Bubeck, V. Perchet, and P. Rigollet, Bounded regret in stochastic multi-armed bandits, Proceedings of the 26th Annual Conference on Learning Theory (COLT), JMLR W&CP, pp.122-134, 2013.

S. Bubeck, V. Perchet, and P. Rigollet, Erratum to [6]. URL http://research.microsoft.com/ en-us/um/people/sebubeck/pub.html The proof of Theorem 8 is not correct. We do not know if the theorem holds true, 2013.

A. N. Burnetas and M. N. Katehakis, Optimal Adaptive Policies for Sequential Allocation Problems, Advances in Applied Mathematics, vol.17, issue.2, pp.122-142, 1996.
DOI : 10.1006/aama.1996.0007

C. Calabro, The exponential complexity of satisfiability problems, 2009.

O. Cappé, A. Garivier, O. Maillard, R. Munos, and G. Stoltz, Kullback???Leibler upper confidence bounds for optimal sequential allocation, The Annals of Statistics, vol.41, issue.3, pp.1516-1541, 2013.
DOI : 10.1214/13-AOS1119SUPP

R. Combes and A. , Proutì ere. 2014. Unimodal bandits without smoothness, ArXiv, pp.1406-7447

A. Garivier, E. Kaufmann, and T. Lattimore, On explore-then-commit strategies, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01322906

E. Kaufmann, O. Capp, and A. Garivier, On the complexity of best arm identication in multi-armed bandit models, Journal of Machine Learning Research, 2016.

S. Kulkarni and G. Lugosi, Finite-time lower bounds for the two-armed bandit problem, IEEE Transactions on Automatic Control, vol.45, issue.4, pp.711-714, 2000.
DOI : 10.1109/9.847107

T. L. Lai and H. Robbins, Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, vol.6, issue.1, pp.4-22, 1985.
DOI : 10.1016/0196-8858(85)90002-8

E. L. Lehmann and G. Casella, Theory of Point Estimation, 1998.

W. R. Thompson, ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES, Biometrika, vol.25, issue.3-4, pp.285-294, 1933.
DOI : 10.1093/biomet/25.3-4.285

Y. Wu, A. György, and C. Szepesvari, Online learning with Gaussian payoffs and side observations, Advances in Neural Information Processing Systems 28, pp.1360-1368, 2015.