, Kdd cup challenge

A. Badanidiyuru, R. Kleinberg, and A. Slivkins, Bandits with knapsacks, Proc. of FOCS, 2013.

J. Broder and P. Rusmevichientong, Dynamic pricing under a general parametric choice model, Operations Research, vol.60, issue.4, pp.965-980, 2012.

S. Bubeck, V. Perchet, and P. Rigollet, Bounded regret in stochastic multi-armed bandits, Proc. of COLT, 2013.

N. Cesa-bianchi and G. Lugosi, Combinatorial bandits, J. Comput. Syst. Sci, vol.78, issue.5, pp.1404-1422, 2012.

S. S. Chandramouli, Multi armed bandit problem: some insights, pp.2013-2022

R. Combes and A. Proutiere, Unimodal bandits: Regret lower bounds and optimal algorithms, Proc. of ICML, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01092662

R. Combes and A. Proutiere, Unimodal bandits: Regret lower bounds and optimal algorithms, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01092662

R. Combes, A. Proutiere, D. Yun, J. Ok, and Y. Yi, Optimal rate sampling in 802.11 systems, Proc. of IEEE INFOCOM, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01092697

E. W. Cope, Regret and convergence bounds for a class of continuum-armed bandit problems, IEEE Trans. Automat. Contr, vol.54, issue.6, pp.1243-1253, 2009.

A. Garivier, Informational confidence bounds for self-normalized averages and applications, Proc. of ITW, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00862062

A. Garivier and O. Cappé, The KL-UCB algorithm for bounded stochastic bandits and beyond, Proc. of COLT, 2011.

A. Gyorgy, T. Linder, G. Lugosi, and G. Ottucsak, The on-line shortest path problem under partial monitoring, Journal of Machine Learning Research, 2007.

C. Jiang and R. Srikant, Bandits with budgets, Proc. of CDC, 2013.
URL : https://hal.archives-ouvertes.fr/hal-01257889

T. Kailath, The divergence and bhattacharyya distance measures in signal selection, IEEE Trans. Communications, vol.15, issue.1, pp.52-60, 1967.

E. Kaufmann, N. Korda, and R. Munos, Thompson sampling: An asymptotically optimal finite-time analysis, Proc. of ALT, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00830033

R. Kleinberg, Nearly tight bounds for the continuum-armed bandit problem, Proc. of NIPS, 2004.

R. Kleinberg, A. Niculescu-mizil, and Y. Sharma, Regret bounds for sleeping experts and bandits, Proc. of COLT, 2008.

T. Lai, Adaptive treatment allocation and the multi-armed bandit problem, The Annals of Statistics, vol.15, issue.3, pp.1091-1114, 1987.

T. Lai and H. Robbins, Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, vol.6, issue.1, pp.4-6, 1985.

H. Robbins, Some aspects of the sequential design of experiments, Bulletin of the American Mathematical Society, vol.58, issue.5, pp.527-535, 1952.

A. Slivkins, Dynamic ad allocation: Bandits with budgets, 2013.

A. Slivkins, F. Radlinski, and S. Gollapudi, Ranked bandits in metric spaces: learning diverse rankings over large document collections, Journal of Machine Learning Research, 2013.

W. R. Thompson, On the likelihood that one unknown probability exceeds another in view of the evidence of two samples, Biometrika, vol.25, issue.3-4, pp.285-294, 1933.

L. Tran-thanh, A. Chapman, E. M. De-cote, A. Rogers, and N. R. Jennings, Epsilon-first policies for budget-limited multi-armed bandits, Proc. of AAAI, 2010.

L. Tran-thanh, A. Chapman, A. Rogers, and N. R. Jennings, Knapsack based optimal policies for budget-limited multi-armed bandits, Proc. of AAAI, 2012.

A. B. Tsybakov, Introduction to non-parametric estimation, 2008.

J. Yu and S. Mannor, Unimodal bandits, Proc. of ICML, 2011.