Policies for Contextual Bandit Problems with Count Payoffs.

Abstract : The contextual bandit problem has been of major interest in the last few years. This corresponds to a sequential decision process where an agent has to choose at each iteration an action to perform, according to some knowledge about the decision environment and the current available actions, with the aim to maximize a cumulative amount of rewards over time. Many instances of the problem exist, depending on the kind of rewards we collect - real, binary, natural - and various algorithms are known to be efficient for some of these instances, either empirically or theoretically. In this paper we focus on the case of count payoffs, which corresponds to bandit problems where rewards are integer rewards, potentially unbounded. Based on a Bayesian Poisson regression model, we propose two new contextual bandit algorithms for this particular case with several concrete applications in real life: an Upper Confidence Bound algorithm and a Thompson Sampling strategy. Our approaches present the advantage to remain analytically tractable and computationally efficient. We experiment the algorithms on both simulated data and a real world scenario of spread maximization on a social network.
Liste complète des métadonnées

Contributor : Thibault Gisselbrecht <>
Submitted on : Tuesday, August 23, 2016 - 11:22:51 AM
Last modification on : Thursday, March 21, 2019 - 2:17:50 PM



Thibault Gisselbrecht, Sylvain Lamprier, Patrick Gallinari. Policies for Contextual Bandit Problems with Count Payoffs.. 27th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2015, Nov 2015, Vietri Sul Mare, Italy. IEEE, pp.542-549, 2015, 〈10.1109/ICTAI.2015.85〉. 〈hal-01355404〉



Record views