J. Audibert and S. Bubeck, Minimax policies for adversarial and stochastic bandits, Proceedings of the 22nd Annual Conference on Learning Theory, COLT'09, pp.217-226, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00834882

P. Auer, N. Cesa-bianchi, and P. Fischer, Finite-time analysis of the multiarmed bandit problem, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002.
DOI : 10.1023/A:1013689704352

P. Auer, N. Cesa-bianchi, Y. Freund, and R. E. Schapire, The Nonstochastic Multiarmed Bandit Problem, SIAM Journal on Computing, vol.32, issue.1, pp.48-77, 2002.
DOI : 10.1137/S0097539701398375

URL : http://homepages.math.uic.edu/%7Elreyzin/f14_mcs548/auer02.pdf

S. Boucheron, G. Lugosi, and P. Massart, Concentration inequalities. A nonasymptotic theory of independence, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00751496

S. Bubeck and C. Liu, Prior-free and prior-dependent regret bounds for Thompson Sampling, 2014 48th Annual Conference on Information Sciences and Systems (CISS), 2013.
DOI : 10.1109/CISS.2014.6814158

URL : http://www.princeton.edu/~sbubeck/NIPS13_BL.pdf

A. N. Burnetas and M. N. Katehakis, Optimal Adaptive Policies for Sequential Allocation Problems, Advances in Applied Mathematics, vol.17, issue.2, pp.122-142, 1996.
DOI : 10.1006/aama.1996.0007

URL : https://doi.org/10.1006/aama.1996.0007

O. Cappé, A. Garivier, O. Maillard, R. Munos, and G. Stoltz, Kullback?Leibler upper confidence bounds for optimal sequential allocation. The Annals of Statistics, pp.1516-1541, 2013.

Y. Chow and H. Teicher, Probability Theory, 1988.

R. Degenne and V. Perchet, Anytime optimal algorithms in stochastic multi-armed bandits, Proceedings of the 2016 International Conference on Machine Learning, ICML'16, pp.1587-1595, 2016.

A. Garivier and O. Cappé, The KL-UCB algorithm for bounded stochastic bandits and beyond, Proceedings of the 24th Annual Conference on Learning Theory, 2011.

A. Garivier, P. Ménard, and G. Stoltz, Explore first, exploit next: The true shape of regret in bandit problems To appear; meanwhile, see arXiv preprint, Mathematics of Operations Research, 2018.

W. Hoeffding, Probability Inequalities for Sums of Bounded Random Variables, Journal of the American Statistical Association, vol.1, issue.301, pp.13-30, 1963.
DOI : 10.1007/BF02883985

J. Honda and A. Takemura, Non-asymptotic analysis of a new bandit algorithm for semi-bounded rewards, Journal of Machine Learning Research, vol.16, pp.3721-3756, 2015.