C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds et al., Learning to rank using gradient descent, Proc. of ICML, 2005.

S. Pandey, D. Agarwal, D. Chakrabarti, and V. Josifovski, Bandits for taxonomies: A model based approach, Proc. of SIAM SDM, 2007.

F. Radlinski and T. Joachims, Active exploration for learning rankings from clickthrough data, Proc. of ACM SIGKDD, 2007.

M. J. Streeter, D. Golovin, and A. Krause, Online learning of assignments, Proc. of NIPS, 2009.

H. Robbins, Some aspects of the sequential design of experiments, Bulletin of the American Mathematical Society, vol.58, issue.5, pp.527-535, 1952.

J. Gittins, Bandit Processes and Dynamic Allocation Indices, 1989.

F. Radlinski, R. Kleinberg, and T. Joachims, Learning diverse rankings with multi-armed bandits, Proc. of ICML, 2008.

Y. Yue and T. Joachims, Interactively optimizing information retrieval systems as a dueling bandits problem, Proc. of ICML, 2009.

Y. Yue and C. Guestrin, Linear submodular bandits and their application to diversified retrieval, Proc. of NIPS, 2011.

S. Khuller, A. Moss, and J. S. Naor, The budgeted maximum coverage problem, Inf. Process. Lett, vol.70, issue.1, pp.39-45, 1999.

P. Kohli, M. Salek, and G. Stoddard, A fast bandit algorithm for recommendations to users with heterogeneous tastes, Proc. of AAAI, 2013.

S. Agrawal, Y. Ding, A. Saberi, and Y. Ye, Correlation robust stochastic optimization, Proc. of ACM SODA, 2010.

A. Slivkins, F. Radlinski, and S. Gollapudi, Ranked bandits in metric spaces: learning optimally diverse rankings over large document collections, Journal of Machine Learning Research, 2013.

S. Bubeck and N. Cesa-bianchi, Regret analysis of stochastic and nonstochastic multi-armed bandit problems, Foundations and Trends in Machine Learning, vol.5, pp.1-122, 2012.

R. , The continuum-armed bandit problem, SIAM J. Control and Optimization, vol.33, issue.6, pp.1926-1951, 1995.

R. Kleinberg, A. Slivkins, and E. Upfal, Multi-armed bandits in metric spaces, Proc. of STOC, 2008.

S. Bubeck, R. Munos, G. Stoltz, and C. Szepesvári, Online optimization in x-armed bandits, Proc. of NIPS, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00329797

S. Magureanu, R. Combes, and A. Proutiere, Lipschitz bandits: Regret lower bound and optimal algorithms, Proc. of COLT, 2014.

V. Dani, T. P. Hayes, and S. M. Kakade, Stochastic linear optimization under bandit feedback, Proc. of COLT, 2008.

A. Flaxman, A. T. Kalai, and H. B. Mcmahan, Online convex optimization in the bandit setting: gradient descent without a gradient, Proc. of ACM SODA, 2005.

L. Bui, R. Johari, and S. Mannor, Clustered bandits, 2012.

T. L. Lai and H. Robbins, Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, vol.6, issue.1, pp.4-6, 1985.

T. L. Graves and T. L. Lai, Asymptotically efficient adaptive choice of control laws in controlled markov chains, SIAM Journal on Control and Optimization, vol.35, issue.3, pp.715-743, 1997.

A. Garivier and O. Cappé, The KL-UCB algorithm for bounded stochastic bandits and beyond, Proc. of COLT, 2011.

R. Combes and A. Proutiere, Unimodal bandits: Regret lower bounds and optimal algorithms, Proc. of ICML, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01092662