Policies for Contextual Bandit Problems with Count Payoffs.

Thibault Gisselbrecht; Sylvain Lamprier; Patrick Gallinari

doi:10.1109/ICTAI.2015.85

Communication Dans Un Congrès Année : 2015

Policies for Contextual Bandit Problems with Count Payoffs.

(1, 2) , (2) , (2)

1
2

Thibault Gisselbrecht

Fonction : Auteur
PersonId : 987816

IRT SystemX

Machine Learning and Information Access

Sylvain Lamprier

Fonction : Auteur
PersonId : 740402
IdHAL : sylvain-lamprier
ORCID : 0000-0002-2508-922X
IdRef : 142632201

Machine Learning and Information Access

Patrick Gallinari

Fonction : Auteur
PersonId : 751615
IdHAL : patrick-gallinari
ORCID : 0000-0001-9060-9001
IdRef : 070709076

Machine Learning and Information Access

Résumé

The contextual bandit problem has been of major interest in the last few years. This corresponds to a sequential decision process where an agent has to choose at each iteration an action to perform, according to some knowledge about the decision environment and the current available actions, with the aim to maximize a cumulative amount of rewards over time. Many instances of the problem exist, depending on the kind of rewards we collect - real, binary, natural - and various algorithms are known to be efficient for some of these instances, either empirically or theoretically. In this paper we focus on the case of count payoffs, which corresponds to bandit problems where rewards are integer rewards, potentially unbounded. Based on a Bayesian Poisson regression model, we propose two new contextual bandit algorithms for this particular case with several concrete applications in real life: an Upper Confidence Bound algorithm and a Thompson Sampling strategy. Our approaches present the advantage to remain analytically tractable and computationally efficient. We experiment the algorithms on both simulated data and a real world scenario of spread maximization on a social network.

Domaines

Intelligence artificielle [cs.AI] Apprentissage [cs.LG]

Thibault Gisselbrecht : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01355404

Soumis le : mardi 23 août 2016-11:22:51

Dernière modification le : mardi 11 avril 2023-15:16:28

Dates et versions

hal-01355404 , version 1 (23-08-2016)

Identifiants

HAL Id : hal-01355404 , version 1
DOI : 10.1109/ICTAI.2015.85

Citer

Thibault Gisselbrecht, Sylvain Lamprier, Patrick Gallinari. Policies for Contextual Bandit Problems with Count Payoffs.. 27th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2015, Nov 2015, Vietri Sul Mare, Italy. pp.542-549, ⟨10.1109/ICTAI.2015.85⟩. ⟨hal-01355404⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UPMC CNRS LIP6 IRT-SYSTEMX SORBONNE-UNIVERSITE SU-SCIENCES

145 Consultations

0 Téléchargements

Policies for Contextual Bandit Problems with Count Payoffs.

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager