S. Agrawal and N. Goyal, Further optimal regret bounds for thompson sampling, Proceedings of the 16th Conference on Artificial Intelligence and Statistics, 2013.

S. Bubeck and N. Cesa-bianchi, Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends R in Machine Learning, vol.5, pp.1-122, 2012.

O. Cappé, A. Garivier, O. Maillard, R. Munos, and G. Stoltz, Kullback-leibler upper confidence bounds for optimal sequential allocation, The Annals of Statistics, vol.41, issue.3, pp.1516-1541, 2013.

O. Chapelle and L. Li, An empirical evaluation of thompson sampling, Advances in neural information processing systems, pp.2249-2257, 2011.

A. Chuklin, I. Markov, and M. De-rijke, Click models for web search, Synthesis Lectures on Information Concepts, Retrieval, and Services, vol.7, issue.3, pp.1-115, 2015.

R. Combes and A. Proutière, Unimodal bandits: Regret lower bounds and optimal algorithms, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01092662

R. Combes, S. Magureanu, and A. Proutiere, Minimal exploration in structured stochastic bandits, Advances in Neural Information Processing Systems, 2017.
URL : https://hal.archives-ouvertes.fr/hal-02395029

K. Jun, R. Willett, S. Wright, and R. Nowak, Bilinear bandits with low-rank structure, 2019.

S. Katariya, B. Kveton, C. Szepesvári, C. Vernade, and Z. Wen, Bernoulli rank-1 bandits for click feedback, IJCAI, 2017.
URL : https://hal.archives-ouvertes.fr/hal-02287914

S. Katariya, B. Kveton, C. Szepesvári, C. Vernade, and Z. Wen, Stochastic rank-1 bandits, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017.

E. Kaufmann, N. Korda, and R. Munos, Thompson Sampling : an Asymptotically Optimal Finite-Time Analysis, Proceedings of the 23rd conference on Algorithmic Learning Theory, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00830033