A multiplicative UCB strategy for Gamma rewards

Matthieu Geist 1, *
* Corresponding author
1 MALIS - MAchine Learning and Interactive Systems
SUPELEC-Campus Metz, CentraleSupélec
Abstract : We consider the stochastic multi-armed bandit problem where rewards are distributed according to Gamma probability measures (unknown up to a lower bound on the form factor). To handle this problem, we propose an UCB-like strategy where indexes are multiplicative (sampled mean times a scaling factor). An upper-bound for the associated regret is provided and the proposed strategy is illustrated on some simple experiments.
Document type :
Conference papers
Complete list of metadatas

Cited literature [10 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01258820
Contributor : Matthieu Geist <>
Submitted on : Tuesday, January 19, 2016 - 3:07:21 PM
Last modification on : Thursday, April 5, 2018 - 12:30:24 PM
Long-term archiving on : Friday, November 11, 2016 - 12:40:27 PM

File

gamma_ucb.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01258820, version 1

Citation

Matthieu Geist. A multiplicative UCB strategy for Gamma rewards. European Workshop on Reinforcement Learning, 2015, Lille, France. ⟨hal-01258820⟩

Share

Metrics

Record views

116

Files downloads

100