Sample mean based index policies by O(log n) regret for the multi-armed bandit problem, Advances in Applied Probability, vol.32, issue.04, pp.1054-1078, 1995. ,
DOI : 10.1016/0196-8858(85)90002-8
Exploration-exploitation trade-off using variance estimates in multiarmed bandits, Theoretical Computer Science, issue.19, p.410, 2009. ,
Regret bounds and minimax policies under partial monitoring, Journal of Machine Learning Research, vol.11, pp.2635-2686, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-00654356
Finite-time analysis of the multiarmed bandit problem, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002. ,
DOI : 10.1023/A:1013689704352
The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR computational mathematics and mathematical physics, pp.200-217, 1967. ,
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Foundations and Trends?? in Machine Learning, vol.5, issue.1, pp.1-122, 2012. ,
DOI : 10.1561/2200000024
Optimal Adaptive Policies for Markov Decision Processes, Mathematics of Operations Research, vol.22, issue.1, pp.222-255, 1997. ,
DOI : 10.1287/moor.22.1.222
Kullback???Leibler upper confidence bounds for optimal sequential allocation, The Annals of Statistics, vol.41, issue.3, pp.1516-1541, 2013. ,
DOI : 10.1214/13-AOS1119SUPP
Probability theory. 2nd, p.988, 1988. ,
Mesures dominantes et théoreme de sanov, Annales de l'IHP Probabilités et statistiques, pp.365-373, 1992. ,
Explore first, exploit next: The true shape of regret in bandit problems, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01276324
Bandit processes and dynamic allocation indices, Journal of the Royal Statistical Society, Series B, vol.41, issue.2, pp.148-177, 1979. ,
DOI : 10.1002/9780470980033
An asymptotically optimal bandit algorithm for bounded support models, Conf. Comput. Learning Theory, 2010. ,
Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, vol.6, issue.1, pp.4-22, 1985. ,
DOI : 10.1016/0196-8858(85)90002-8
URL : https://doi.org/10.1016/0196-8858(85)90002-8
Adaptive treatment allocation and the multi-armed bandit problem. The Annals of Statistics, pp.1091-1114, 1987. ,
Boundary crossing problems for sample means. The Annals of Probability, pp.375-396, 1988. ,
A finite-time analysis of multi-armed bandits problems with kullbackleibler divergences, Conf. Comput. Learning Theory, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00574987
Some aspects of the sequential design of experiments, Bulletin of the American Mathematical Society, vol.58, issue.5, pp.527-535, 1952. ,
DOI : 10.1090/S0002-9904-1952-09620-8
Herbert Robbins Selected Papers, 2012. ,
On the likelihood that one unknown probability exceeds another in view of the evidence of two samples, Biometrika, vol.25, issue.34, pp.285-294, 1933. ,
On a criterion for the rejection of observations and the distribution of the ratio of deviation to sample standard deviation, The Annals of Mathematical Statistics, vol.6, issue.4, pp.214-219, 1935. ,
Sequential Tests of Statistical Hypotheses, The Annals of Mathematical Statistics, vol.16, issue.2, pp.117-186, 1945. ,
DOI : 10.1214/aoms/1177731118