Further optimal regret bounds for Thompson sampling, Proceedings of the 16th International Conference on Artificial Intelligence and Statistics (AIStats), pp.99-107, 2013. ,
Best arm identification in multi-armed bandits, Proceedings of the 23rd Annual Conference on Learning Theory (CoLT, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-00654404
Finitetime analysis of the multi-armed bandit problem, Machine Learning Journal, vol.47, issue.2-3, pp.235-256, 2002. ,
Pareto front identification from stochastic bandit feedback, Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AIStats), pp.939-947, 2016. ,
Subsampling for multi-armed bandits, Joint European Conference on Machine Learning and Knowledge Dis, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01025651
Subsampling for efficient non-parametric bandit exploration, Advances in Neural Information Processing Systems, vol.34, 2020. ,
Pure exploration in multi-armed bandits problems, Proceedings of the 20th International Conference on Algorithmic Learning Theory (ALT), pp.23-37, 2009. ,
Kullback-Leibler upper confidence bounds for optimal sequential allocation, Annals of Statistics, vol.41, issue.3, pp.1516-1541, 2013. ,
Tight (lower) bounds for the fixed budget best arm identification bandit problem, Proceedings of the 29th Annual Conference on Learning Theory (CoLT, 2016. ,
The multi-armed bandit problem: An efficient nonparametric solution, Annals of Statistics, vol.48, issue.1, pp.346-373, 2020. ,
Sequential design of experiments, The Annals of Mathematical Statistics, vol.30, issue.3, pp.755-770, 1959. ,
Follow the leader if you can, hedge if you must, Journal of Machine Learning Research, vol.15, pp.1281-1316, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00920549
Nonasymptotic pure exploration by solving games, Advances in Neural Information Processing Systems, vol.32, 2019. ,
URL : https://hal.archives-ouvertes.fr/hal-02402665
Pure exploration with multiple correct answers, Advances in Neural Information Processing Systems, vol.32, 2019. ,
Gamification of pure exploration for linear bandits, Proceedings of the 37th International Conference on Machine Learning (ICML), 2020. ,
URL : https://hal.archives-ouvertes.fr/hal-02884330
Structure Adaptive Algorithms for Stochastic Bandits, Proceedings of the 37th International Conference on Machine Learning (ICML), 2020. ,
Designing multiobjective multi-armed bandits algorithms: A study, Proceedings of the 2013 International Joint Conference on Neural Networks (IJCNN), pp.2358-2365, 2013. ,
Scalarization based Pareto optimal set of arms identification algorithms, Proceedings of the 2014 International Joint Conference on Neural Networks (IJCNN), pp.2690-2697, 2014. ,
Contextual bandits for adapting treatment in a mouse model of de Novo Carcinogenesis, Proceedings of the 3rd Machine Learning for Health Care Conference (MLHC), 2018. ,
Action elimination and stopping conditions for reinforcement learning, Proceedings of the 20th International Conference on Machine Learning (ICML), pp.162-169, 2003. ,
Best arm identification: A unified approach to fixed budget and fixed confidence, Advances in Neural Information Processing Systems 25 (NIPS), pp.3212-3220, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00772615
Optimal best arm identification with fixed confidence, Proceedings of the 29th Annual Conference on Learning Theory (CoLT, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01273838
Explore first, exploit next: The true shape of regret in bandit problems, Mathematics of Operations Research, vol.44, issue.2, pp.377-399, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01276324
Non-asymptotic analysis of a new bandit algorithm for semi-bounded rewards, Journal of Machine Learning Research, vol.16, pp.3721-3756, 2015. ,
Risk-aware multi-armed bandit problem with application to portfolio selection, Royal Society Open Science, issue.11, p.4, 2017. ,
UCB: An optimal exploration algorithm for multi-armed bandits, Proceedings of the 27th Annual Conference on Learning Theory (CoLT), pp.423-439, 2014. ,
PAC subset selection in stochastic multi-armed bandits, Proceedings of the 29th International Conference on Machine Learning (ICML), pp.655-662, 2012. ,
Almost optimal exploration in multi-armed bandits, Proceedings of the 30th International Conference on Machine Learning (ICML), pp.1238-1246, 2013. ,
Top feasible arm identification, Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019. ,
Learning the distribution with largest mean: two bandit frameworks, ESAIM: Proceedings and Surveys, vol.60, pp.114-131, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01449822
, Mixture martingales revisited with applications to sequential tests and confidence intervals, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01886612
Thompson sampling: An asymptotically optimal finite-time analysis, Proceedings of the 23rd International Conference on Algorithmic Learning Theory, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00830033
Thompson sampling for 1-dimensional exponential family bandits, Advances in Neural Information Processing Systems 27 (NIPS), pp.1448-1456, 2013. ,
Online learning and Blackwell approachability with partial monitoring: Optimal convergence rates, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AIStats), vol.54, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-02734035
Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, vol.6, issue.1, pp.4-22, 1985. ,
Multiobjective generalized linear bandits, Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI), pp.3080-3086, 2019. ,
Gradient ascent for active exploration in bandit problems, 2019. ,
Approachability of convex sets in games with partial monitoring, Journal of Optimization Theory and Applications, vol.149, issue.3, pp.665-677, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00490434
Approachability, regret and calibration: Implications and equivalences, Journal of Dynamics and Games, vol.1, issue.2, pp.181-254, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00773218
Improving the expected improvement algorithm, Advances in Neural Information Processing Systems 30 (NIPS), pp.5381-5391, 2017. ,
Simple Bayesian algorithms for best arm identification, Proceedings of the 29th Annual Conference on Learning Theory (CoLT, 2016. ,
Fixed-confidence guarantees for Bayesian best-arm identification, Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AIStats), 2020. ,
URL : https://hal.archives-ouvertes.fr/hal-02330187
On the likelihood that one unknown probability exceeds another in view of the evidence of two samples, Biometrika, vol.25, p.285, 1933. ,
Pure exploration of multi-armed bandits with heavy-tailed payoffs, Proceedings of the 34th Conference on Uncertainty in Artificial Intelligence (UAI), 2018. ,
Uncovering the essential links in online commercial networks, Scientific Reports, p.6, 2016. ,
Active learning for multi-criterion optimization, Proceedings of the 30th International Conference on Machine Learning (ICML), 2013. ,