S. Agrawal and N. Devanur, Bandits with concave rewards and convex knapsacks, Proceedings of the fifteenth ACM conference on Economics and computation, EC '14, pp.989-1006, 2014.
DOI : 10.1145/2600057.2602844

S. Agrawal and N. Goyal, Analysis of thompson sampling for the multi-armed bandit problem. arXiv preprint, 2011.

S. Agrawal and N. Goyal, Further optimal regret bounds for thompson sampling. arXiv preprint, 2012.
DOI : 10.1145/3088510

V. Anantharam, J. Varaiya, and . Walrand, Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part I: IID rewards. Automatic Control, IEEE Transactions on, vol.32, issue.11, pp.968-976, 1987.
DOI : 10.1109/tac.1987.1104491

J. Audibert, G. Bubeck, and . Lugosi, Minimax policies for combinatorial prediction games. arXiv preprint, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00624463

A. Badanidiyuru, R. Kleinberg, and A. Slivkins, Bandits with Knapsacks, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, pp.207-216, 2013.
DOI : 10.1109/FOCS.2013.30

A. Burnetas and M. Katehakis, Optimal Adaptive Policies for Sequential Allocation Problems, Advances in Applied Mathematics, vol.17, issue.2, pp.122-142, 1996.
DOI : 10.1006/aama.1996.0007

URL : https://doi.org/10.1006/aama.1996.0007

O. Cappé, O. Garivier, . Maillard, G. Munos, and . Stoltz, Kullback?leibler upper confidence bounds for optimal sequential allocation. The Annals of Statistics, pp.1516-1541, 2013.

O. Cappé, O. Garivier, . Maillard, G. Munos, and . Stoltz, Kullback???Leibler upper confidence bounds for optimal sequential allocation, The Annals of Statistics, vol.41, issue.3, pp.1516-1541, 2013.
DOI : 10.1214/13-AOS1119SUPP

N. Cesa-bianchi and G. Lugosi, Combinatorial bandits, Journal of Computer and System Sciences, vol.78, issue.5, pp.1404-1422, 2012.
DOI : 10.1016/j.jcss.2012.01.001

W. Chen, Y. Wang, and . Yuan, Combinatorial multi-armed bandit: General framework and applications, Proceedings of the 30th International Conference on Machine Learning, pp.151-159, 2013.

R. Combes, A. Magureanu, and C. Laroche, Learning to Rank: Regret Lower Bounds and Efficient Algorithms, Proceedings of the 2015 {ACM} {SIGMETRICS} International Conference on Measurement and Modeling of Computer Systems, pp.231-244, 2015.
DOI : 10.1145/2745844.2745852

R. Combes, M. Shahi, M. Proutiere, and . Lelarge, Combinatorial Bandits Revisited, Advances in Neural Information Processing Systems, pp.2107-2115, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01257796

G. Dantzig, Discrete-Variable Extremum Problems, Operations Research, vol.5, issue.2, pp.266-288, 1957.
DOI : 10.1287/opre.5.2.266

A. Garivier, G. Ménard, and . Stoltz, Explore First, Exploit Next: The True Shape of Regret in Bandit Problems, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01276324

J. Gittins, Bandit processes and dynamic allocation indices, Journal of the Royal Statistical Society, Series B, vol.41, issue.2, pp.148-177, 1979.
DOI : 10.1002/9780470980033

R. Karp, Reducibility among combinatorial problems, 1972.
DOI : 10.1007/978-3-540-68279-0_8

E. Kaufmann, A. Cappé, and . Garivier, On Bayesian upper confidence bounds for bandit problems, International Conference on Artificial Intelligence and Statistics, pp.592-600, 2012.

E. Kaufmann, R. Korda, and . Munos, Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis, Algorithmic Learning Theory, pp.199-213, 2012.
DOI : 10.1007/978-3-642-34106-9_18

URL : https://hal.archives-ouvertes.fr/hal-00830033

J. Komiyama, H. Honda, and . Nakagawa, Optimal regret analysis of thompson sampling in stochastic multi-armed bandit problem with multiple plays, 2015.

N. Korda, R. Kaufmann, and . Munos, Thompson sampling for 1-dimensional exponential family bandits, Advances in Neural Information Processing Systems, pp.1448-1456, 2013.

B. Kveton, . Weng, . Ashkan, B. Hoda, and . Eriksson, Matroid Bandits: Fast Combinatorial Optimization with Learning, Uncertainty in Artificial Intelligence (UAI), 2014.

B. Kveton, C. Szepesvári, A. Wen, and . Ashkan, Cascading Bandits: Learning to Rank in the Cascade Model, Proceedings of the 32nd International Conference on Machine Learning, pp.767-776, 2015.

B. Kveton, . Zheng, C. Ashkan, and . Szepesvári, Combinatorial Cascading Bandits, Advances in Neural Information Processing Systems (NIPS), 2015.

P. Lagrée, C. Vernade, and O. Cappé, Multiple-Play Bandits in the Postition-Based Model, 2016.

T. Lai and H. Robbins, Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, vol.6, issue.1, pp.4-22, 1985.
DOI : 10.1016/0196-8858(85)90002-8

URL : https://doi.org/10.1016/0196-8858(85)90002-8

H. Li and Y. Xia, Infinitely Many-Armed Bandits with Budget Constraints, AAAI, pp.2182-2188, 2017.

A. Luedtke, A. Kaufmann, and . Chambaz, Asymptotically Optimal Algorithms for Multiple Play Bandits with Partial Feedback, 2016.

R. Team, R: a language and environment for statistical computing, 2014.

H. Robbins, Some aspects of the sequential design of experiments, Bulletin of the American Mathematical Society, vol.58, issue.5, pp.527-535, 1952.
DOI : 10.1090/S0002-9904-1952-09620-8

W. Thompson, ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES, Biometrika, vol.25, issue.3-4, pp.285-294, 1933.
DOI : 10.1093/biomet/25.3-4.285

L. Tran-thanh, A. Chapman, N. Rogers, and . Jennings, Knapsack Based Optimal Policies for Budget-Limited Multi-Armed Bandits, AAAI, 2012.

Z. Wen, A. Kveton, and . Ashkan, Efficient Learning in Large-Scale Combinatorial Semi- Bandits, International Conference on Machine Learning (ICML), 2015.

Y. Xia, . Li, . Qin, T. Yu, and . Liu, Thompson Sampling for Budgeted Multi-Armed Bandits, IJCAI, pp.3960-3966, 2015.

Y. Xia, X. Ding, . Zhang, T. Yu, and . Qin, Budgeted bandit problems with continuous random costs, Asian Conference on Machine Learning, pp.317-332, 2016.

Y. Xia, . Qin, . Ma, T. Yu, and . Liu, Budgeted Multi-Armed Bandits with Multiple Plays, IJCAI, pp.2210-2216, 2016.