Policies for Contextual Bandit Problems with Count Payoffs.

Abstract : The contextual bandit problem has been of major interest in the last few years. This corresponds to a sequential decision process where an agent has to choose at each iteration an action to perform, according to some knowledge about the decision environment and the current available actions, with the aim to maximize a cumulative amount of rewards over time. Many instances of the problem exist, depending on the kind of rewards we collect - real, binary, natural - and various algorithms are known to be efficient for some of these instances, either empirically or theoretically. In this paper we focus on the case of count payoffs, which corresponds to bandit problems where rewards are integer rewards, potentially unbounded. Based on a Bayesian Poisson regression model, we propose two new contextual bandit algorithms for this particular case with several concrete applications in real life: an Upper Confidence Bound algorithm and a Thompson Sampling strategy. Our approaches present the advantage to remain analytically tractable and computationally efficient. We experiment the algorithms on both simulated data and a real world scenario of spread maximization on a social network.
Type de document :
Communication dans un congrès
27th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2015, Nov 2015, Vietri Sul Mare, Italy. IEEE, pp.542-549, 2015, 〈10.1109/ICTAI.2015.85〉
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01355404
Contributeur : Thibault Gisselbrecht <>
Soumis le : mardi 23 août 2016 - 11:22:51
Dernière modification le : jeudi 22 novembre 2018 - 14:31:02

Identifiants

Citation

Thibault Gisselbrecht, Sylvain Lamprier, Patrick Gallinari. Policies for Contextual Bandit Problems with Count Payoffs.. 27th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2015, Nov 2015, Vietri Sul Mare, Italy. IEEE, pp.542-549, 2015, 〈10.1109/ICTAI.2015.85〉. 〈hal-01355404〉

Partager

Métriques

Consultations de la notice

163