On the likelihood that one unknown probability exceeds another in view of the evidence of two sample Bulletin of the American mathematics society, pp.285-294, 1933. ,
Regret analysis of stochastic and nonstochastic multi-armed bandit problems, Foundations and trends in machine learning, pp.1-122, 2012. ,
The theory of response-adaptive randomization in clinical trials, 2006. ,
DOI : 10.1002/047005588X
Group sequential methods with applications to clinical trials, 2000. ,
Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, vol.6, issue.1, pp.4-22, 1985. ,
DOI : 10.1016/0196-8858(85)90002-8
URL : https://doi.org/10.1016/0196-8858(85)90002-8
Complexity of Best-Arm Identification in Multi-Armed Bandits, Journal of Machine Learning Research, 2015. ,
Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis, 2012. ,
DOI : 10.1007/978-3-642-34106-9_18
URL : https://hal.archives-ouvertes.fr/hal-00830033
Finite-time analysis of the multiarmed bandit problem, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002. ,
DOI : 10.1023/A:1013689704352
The best of both worlds: stochastic and adversarial bandits, em Conference On Learning Theory (COLT), 2012. ,
A Fast Bandit Algorithm for Recommendation to Users With Heterogeneous Tastes, 27th AAAI Conference on Artificial Intellignece, pp.1135-1141, 2013. ,
Learning diverse rankings with multi-armed bandits, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.784-791, 2008. ,
DOI : 10.1145/1390156.1390255
URL : http://www.cs.cornell.edu/People/tj/publications/radlinski_etal_08a.pdf