Online learning with malicious noise and the closure algorithm Finite-time analysis of the multiarmed bandit problem, ACBF02] Peter Auer, pp.83-99235, 1998. ,
The nonstochastic multiarmed bandit problem ,
Thompson sampling for contextual bandits with linear payoffs. CoRR, 2012. [BL05] Léon Bottou and Yann LeCun. On-line learning for very large datasets, BMSS08] Sébastien Bubeck, Rémi Munos, Gilles Stoltz, and Csaba Szepesvári, pp.48-77137, 2002. ,
Mortal multiarmed bandits, NIPS, pp.273-280, 2008. ,
Contextual bandits with linear payoff functions, JMLR Proceedings, pp.208-214, 2011. ,
Efficient optimal learning for contextual bandits, 1106. ,
A stochastic bandit algorithm for scratch games ,
JMLR.org, 2012. [FU13] Raphaël Feraud and Tanguy Urvoy. Exploration and exploitation of scratch games, JMLR Proceedings Machine Learning, pp.129-143377, 2013. ,
Feature selection as a one-player game, Omnipress, pp.359-366, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00484049
Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis, Algorithmic Learning Theory, Proc. of the 23rd International Conference (ALT), volume LNCS 7568, pp.199-213, 2012. ,
DOI : 10.1007/978-3-642-34106-9_18
URL : https://hal.archives-ouvertes.fr/hal-00830033
Regret bounds for sleeping experts and bandits, COLT, pp.425-436, 2008. ,
Bandit Based Monte-Carlo Planning, KSST08] Sham M. Kakade, Shai Shalev-Shwartz, and Ambuj Tewari Proceedings of the 25th International Conference on Machine Learning, ICML '08, pp.282-293, 2006. ,
DOI : 10.1007/11871842_29
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.102.1296
Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, vol.6, issue.1, pp.4-22, 1985. ,
DOI : 10.1016/0196-8858(85)90002-8
Playing atari with deep reinforcement learning, 2013. ,
Parallel distributed processing : Explorations in the microstructure of cognition chapter Learning Internal Representations by Error Propagation [Ros58] Frank Rosenblatt. The perceptron : A probabilistic model for information storage and organization in the brain, Psychological Review, vol.1, issue.6, pp.318-362, 1958. ,
Pac-bayesian analysis of contextual bandits, NIPS, pp.1683-1691, 2011. ,
Programming backgammon using self-teaching neural nets, Artificial Intelligence, vol.134, issue.1-2, pp.181-199, 2002. ,
DOI : 10.1016/S0004-3702(01)00110-2
URL : http://doi.org/10.1016/s0004-3702(01)00110-2
On the likelihood that one unknown probability exceeds another in view of the evidence of two samples, Biometrika, vol.25, pp.285-294, 1933. ,