W. Thompson, On the likelihood that one unknown probability exceeds another in view of the evidence of two sample Bulletin of the American mathematics society, pp.285-294, 1933.

S. Bubeck and N. Cesa-bianchi, Regret analysis of stochastic and nonstochastic multi-armed bandit problems, Foundations and trends in machine learning, pp.1-122, 2012.

F. Hu and W. F. Rosenberger, The theory of response-adaptive randomization in clinical trials, 2006.
DOI : 10.1002/047005588X

C. Jennison and B. W. Turnbull, Group sequential methods with applications to clinical trials, 2000.

T. Lai and H. Robbins, Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, vol.6, issue.1, pp.4-22, 1985.
DOI : 10.1016/0196-8858(85)90002-8
URL : https://doi.org/10.1016/0196-8858(85)90002-8

E. Kaufmann, O. Cappé, and A. Garivier, Complexity of Best-Arm Identification in Multi-Armed Bandits, Journal of Machine Learning Research, 2015.

E. Kaufmann, N. Korda, and R. Munos, Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis, 2012.
DOI : 10.1007/978-3-642-34106-9_18
URL : https://hal.archives-ouvertes.fr/hal-00830033

P. Auer, N. Cesa-bianchi, and P. Fischer, Finite-time analysis of the multiarmed bandit problem, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002.
DOI : 10.1023/A:1013689704352

S. Bubeck and A. Slivkins, The best of both worlds: stochastic and adversarial bandits, em Conference On Learning Theory (COLT), 2012.

P. Kohli, M. Salek, and G. Stoddard, A Fast Bandit Algorithm for Recommendation to Users With Heterogeneous Tastes, 27th AAAI Conference on Artificial Intellignece, pp.1135-1141, 2013.

F. Radlinski, R. Kleinberg, and T. Joachims, Learning diverse rankings with multi-armed bandits, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.784-791, 2008.
DOI : 10.1145/1390156.1390255
URL : http://www.cs.cornell.edu/People/tj/publications/radlinski_etal_08a.pdf