S. Agrawal and N. Goyal, Further optimal regret bounds for Thompson sampling, Proceedings of the 16th International Conference on Artificial Intelligence and Statistics (AIStats), pp.99-107, 2013.

J. Audibert and S. Bubeck, Best arm identification in multi-armed bandits, Proceedings of the 23rd Annual Conference on Learning Theory (CoLT, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00654404

P. Auer, N. Cesa-bianchi, and P. Fischer, Finitetime analysis of the multi-armed bandit problem, Machine Learning Journal, vol.47, issue.2-3, pp.235-256, 2002.

P. Auer, C. K. Chiang, R. Ortner, and M. M. Drugan, Pareto front identification from stochastic bandit feedback, Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AIStats), pp.939-947, 2016.

A. Baransi, O. Maillard, and S. Mannor, Subsampling for multi-armed bandits, Joint European Conference on Machine Learning and Knowledge Dis, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01025651

D. Baudry, E. Kaufmann, and O. Maillard, Subsampling for efficient non-parametric bandit exploration, Advances in Neural Information Processing Systems, vol.34, 2020.

S. Bubeck, R. Munos, and G. Stoltz, Pure exploration in multi-armed bandits problems, Proceedings of the 20th International Conference on Algorithmic Learning Theory (ALT), pp.23-37, 2009.

O. Cappé, A. Garivier, O. A. Maillard, R. Munos, and G. Stoltz, Kullback-Leibler upper confidence bounds for optimal sequential allocation, Annals of Statistics, vol.41, issue.3, pp.1516-1541, 2013.

A. Carpentier and A. Locatelli, Tight (lower) bounds for the fixed budget best arm identification bandit problem, Proceedings of the 29th Annual Conference on Learning Theory (CoLT, 2016.

H. P. Chan, The multi-armed bandit problem: An efficient nonparametric solution, Annals of Statistics, vol.48, issue.1, pp.346-373, 2020.

H. Chernoff, Sequential design of experiments, The Annals of Mathematical Statistics, vol.30, issue.3, pp.755-770, 1959.

S. De-rooij, T. Van-erven, P. D. Grünwald, and W. M. Koolen, Follow the leader if you can, hedge if you must, Journal of Machine Learning Research, vol.15, pp.1281-1316, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00920549

R. Degenne, W. Koolen, and P. Ménard, Nonasymptotic pure exploration by solving games, Advances in Neural Information Processing Systems, vol.32, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02402665

R. Degenne and W. M. Koolen, Pure exploration with multiple correct answers, Advances in Neural Information Processing Systems, vol.32, 2019.

R. Degenne, P. Ménard, X. Shang, and M. Valko, Gamification of pure exploration for linear bandits, Proceedings of the 37th International Conference on Machine Learning (ICML), 2020.
URL : https://hal.archives-ouvertes.fr/hal-02884330

R. Degenne, H. Shao, and W. M. Koolen, Structure Adaptive Algorithms for Stochastic Bandits, Proceedings of the 37th International Conference on Machine Learning (ICML), 2020.

M. M. Drugan and A. Nowe, Designing multiobjective multi-armed bandits algorithms: A study, Proceedings of the 2013 International Joint Conference on Neural Networks (IJCNN), pp.2358-2365, 2013.

M. M. Drugan and A. Nowe, Scalarization based Pareto optimal set of arms identification algorithms, Proceedings of the 2014 International Joint Conference on Neural Networks (IJCNN), pp.2690-2697, 2014.

A. Durand, C. Achilleos, D. Iacovides, K. Strati, and J. Pineau, Contextual bandits for adapting treatment in a mouse model of de Novo Carcinogenesis, Proceedings of the 3rd Machine Learning for Health Care Conference (MLHC), 2018.

E. Even-dar, S. Mannor, and Y. Mansour, Action elimination and stopping conditions for reinforcement learning, Proceedings of the 20th International Conference on Machine Learning (ICML), pp.162-169, 2003.

V. Gabillon, M. Ghavamzadeh, and A. Lazaric, Best arm identification: A unified approach to fixed budget and fixed confidence, Advances in Neural Information Processing Systems 25 (NIPS), pp.3212-3220, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00772615

A. Garivier and E. Kaufmann, Optimal best arm identification with fixed confidence, Proceedings of the 29th Annual Conference on Learning Theory (CoLT, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01273838

A. Garivier, P. Ménard, and G. Stoltz, Explore first, exploit next: The true shape of regret in bandit problems, Mathematics of Operations Research, vol.44, issue.2, pp.377-399, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01276324

J. Honda and A. Takemura, Non-asymptotic analysis of a new bandit algorithm for semi-bounded rewards, Journal of Machine Learning Research, vol.16, pp.3721-3756, 2015.

X. Huo and F. Fu, Risk-aware multi-armed bandit problem with application to portfolio selection, Royal Society Open Science, issue.11, p.4, 2017.

K. Jamieson, M. Malloy, R. Nowak, and S. Bubeck, UCB: An optimal exploration algorithm for multi-armed bandits, Proceedings of the 27th Annual Conference on Learning Theory (CoLT), pp.423-439, 2014.

S. Kalyanakrishnan, A. Tewari, P. Auer, and P. Stone, PAC subset selection in stochastic multi-armed bandits, Proceedings of the 29th International Conference on Machine Learning (ICML), pp.655-662, 2012.

Z. Karnin, T. Koren, and O. Somekh, Almost optimal exploration in multi-armed bandits, Proceedings of the 30th International Conference on Machine Learning (ICML), pp.1238-1246, 2013.

J. Katz-samuels and C. Scott, Top feasible arm identification, Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019.

E. Kaufmann and A. Garivier, Learning the distribution with largest mean: two bandit frameworks, ESAIM: Proceedings and Surveys, vol.60, pp.114-131, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01449822

E. Kaufmann and W. Koolen, Mixture martingales revisited with applications to sequential tests and confidence intervals, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01886612

E. Kaufmann, N. Korda, and R. Munos, Thompson sampling: An asymptotically optimal finite-time analysis, Proceedings of the 23rd International Conference on Algorithmic Learning Theory, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00830033

N. Korda, E. Kaufmann, and R. Munos, Thompson sampling for 1-dimensional exponential family bandits, Advances in Neural Information Processing Systems 27 (NIPS), pp.1448-1456, 2013.

J. Kwon and V. Perchet, Online learning and Blackwell approachability with partial monitoring: Optimal convergence rates, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AIStats), vol.54, 2017.
URL : https://hal.archives-ouvertes.fr/hal-02734035

T. Lai and H. Robbins, Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, vol.6, issue.1, pp.4-22, 1985.

S. Lu, G. Wang, Y. Hu, and L. Zhang, Multiobjective generalized linear bandits, Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI), pp.3080-3086, 2019.

P. Ménard, Gradient ascent for active exploration in bandit problems, 2019.

V. Perchet, Approachability of convex sets in games with partial monitoring, Journal of Optimization Theory and Applications, vol.149, issue.3, pp.665-677, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00490434

V. Perchet, Approachability, regret and calibration: Implications and equivalences, Journal of Dynamics and Games, vol.1, issue.2, pp.181-254, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00773218

C. Qin, D. Klabjan, and D. Russo, Improving the expected improvement algorithm, Advances in Neural Information Processing Systems 30 (NIPS), pp.5381-5391, 2017.

D. Russo, Simple Bayesian algorithms for best arm identification, Proceedings of the 29th Annual Conference on Learning Theory (CoLT, 2016.

X. Shang, R. De-heide, E. Kaufmann, P. Ménard, and M. Valko, Fixed-confidence guarantees for Bayesian best-arm identification, Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AIStats), 2020.
URL : https://hal.archives-ouvertes.fr/hal-02330187

W. R. Thompson, On the likelihood that one unknown probability exceeds another in view of the evidence of two samples, Biometrika, vol.25, p.285, 1933.

X. Yu, H. Shao, M. R. Lyu, and I. King, Pure exploration of multi-armed bandits with heavy-tailed payoffs, Proceedings of the 34th Conference on Uncertainty in Artificial Intelligence (UAI), 2018.

W. Zeng, M. Fang, J. Shao, and M. Shang, Uncovering the essential links in online commercial networks, Scientific Reports, p.6, 2016.

M. Zuluaga, A. Krause, G. Sergent, and M. Puschel, Active learning for multi-criterion optimization, Proceedings of the 30th International Conference on Machine Learning (ICML), 2013.