P. Auer, N. Cesa-bianchi, and P. Fischer, Finite-time analysis of the multi-armed bandit problem, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002.
DOI : 10.1023/A:1013689704352

S. Boucheron, G. Lugosi, and P. Massart, Concentration inequalities: A nonasymptotic theory of independence, 2013.
DOI : 10.1093/acprof:oso/9780199535255.001.0001

URL : https://hal.archives-ouvertes.fr/hal-00794821

S. Bubeck, Bandits Games and Clustering Foundations, 2010.
URL : https://hal.archives-ouvertes.fr/tel-00845565

S. Bubeck-andnicoì-o-cesa-bianchi, Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and trends in machine learning, pp.1-122, 2012.

A. Garivier and O. Cappé, The KL-UCB algorithm for bounded stochastic bandits and beyond, Annual Conference on Learning Theory (COLT), 2011.

W. Hoeffding, Probability Inequalities for Sums of Bounded Random Variables, Journal of the American Statistical Association, vol.1, issue.301, pp.13-30, 1963.
DOI : 10.1214/aoms/1177730491

W. Jouini, Contribution to learning and decision making under uncertainty for Cognitive Radio
URL : https://hal.archives-ouvertes.fr/tel-00765437

W. Jouini and C. Moy, Channel selection with Rayleigh fading: A multi-armed bandit framework, 2012 IEEE 13th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), pp.299-303, 2012.
DOI : 10.1109/SPAWC.2012.6292914

URL : https://hal.archives-ouvertes.fr/hal-00721010

N. Korda, E. Kaufmann, and R. Munos, Thompson sampling for 1- dimensional exponential family bandits, Advances in Neural Information Processing Systems, pp.1448-1456, 2013.

T. Leung, L. , and H. Robbins, Asymptotically efficient adaptive allocation rules, Advances in applied mathematics, vol.6, issue.1, pp.4-22, 1985.