R. Agrawal, Sample mean based index policies with o(log n) regret for the multi-armed bandit problem, Advances in Applied Probability, vol.27, issue.4, pp.1054-1078, 1995.

J. Audibert and S. Bubeck, Minimax policies for adversarial and stochastic bandits, COLT, pp.217-226, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00834882

P. Auer, N. Cesa-bianchi, and P. Fischer, Finite-time analysis of the multiarmed bandit problem, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002.
DOI : 10.1023/A:1013689704352

P. Auer, N. Cesa-bianchi, Y. Freund, E. Robert, and . Schapire, The Nonstochastic Multiarmed Bandit Problem, SIAM Journal on Computing, vol.32, issue.1, pp.48-77, 2002.
DOI : 10.1137/S0097539701398375

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.130.158

S. Bubeck and C. Liu, Prior-free and prior-dependent regret bounds for Thompson Sampling, 2014 48th Annual Conference on Information Sciences and Systems (CISS), pp.638-646, 2013.
DOI : 10.1109/CISS.2014.6814158

URL : http://arxiv.org/abs/1304.5758

N. Apostolos, . Burnetas, N. Michael, and . Katehakis, Optimal adaptive policies for sequential allocation problems, Advances in Applied Mathematics, vol.17, issue.2, pp.122-142, 1996.

O. Cappé, A. Garivier, O. Maillard, and R. Munos, Gilles Stoltz, et al. Kullback?leibler upper confidence bounds for optimal sequential allocation. The Annals of Statistics, pp.1516-1541, 2013.

N. Cesa-bianchi and G. Lugosi, Prediction, learning, and games, 2006.
DOI : 10.1017/CBO9780511546921

R. Degenne and V. Perchet, Anytime optimal algorithms in stochastic multi-armed bandits, Proceedings of the 33rd International Conference on International Conference on Machine Learning, pp.1587-1595

A. Garivier and O. Cappé, The kl-ucb algorithm for bounded stochastic bandits and beyond, COLT, pp.359-376, 2011.

A. Garivier, T. Lattimore, and E. Kaufmann, On explore-then-commit strategies, Advances in Neural Information Processing Systems, pp.784-792, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01322906

A. Garivier, P. Ménard, and G. Stoltz, Explore first, exploit next: The true shape of regret in bandit problems, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01276324

E. Kaufmann, On bayesian index policies for sequential resource allocation. arXiv preprint, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01251606

E. Kaufmann, O. Cappé, and A. Garivier, On bayesian upper confidence bounds for bandit problems, AISTATS, pp.592-600, 2012.

N. Korda, E. Kaufmann, and R. Munos, Thompson sampling for 1-dimensional exponential family bandits, Advances in Neural Information Processing Systems, pp.1448-1456, 2013.

T. Leung, L. , and H. Robbins, Asymptotically efficient adaptive allocation rules, Advances in applied mathematics, vol.6, issue.1, pp.4-22, 1985.

T. Lattimore, Optimally confident ucb: Improved regret for finite-armed bandits. arXiv preprint, 2015.

O. Maillard, R. Munos, and G. Stoltz, A finite-time analysis of multi-armed bandits problems with Kullback-Leibler divergences, Proceedings of the 23rd Annual Conference on Learning Theory, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00574987

M. Talagrand, The missing factor in hoeffding's inequalities, Annales de l'IHP Probabilités et statistiques, pp.689-702, 1995.