Sequential Resource Allocation in Linear Stochastic Bandits

Marta Soare 1
1 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal, Inria Lille - Nord Europe
Abstract : This thesis is dedicated to the study of resource allocation problems in uncertain environments, where an agent can sequentially select which action to take. After each step, the environment returns a noisy observation of the value of the selected action. These observations guide the agent in adapting his resource allocation strategy towards reaching a given objective. In the most typical setting of this kind, the stochastic multi-armed bandit (MAB), it is assumed that each observation is drawn from an unknown probability distribution associated with the selected action and gives no information on the expected value of the other actions. The MAB setting has been widely studied and optimal allocation strategies were proposed to solve various objectives under the MAB assumptions. Here, we consider a variant of the MAB setting where there exists a global linear structure in the environment and by selecting an action, the agent also gathers information on the value of the other actions. Therefore, the agent needs to adapt his resource allocation strategy to exploit the structure in the environment. In particular, we study the design of sequences of actions that the agent should take to reach objectives such as: (i) identifying the best value with a fixed confidence and using a minimum number of pulls, or (ii) minimizing the prediction error on the value of each action. In addition, we investigate how the knowledge gathered by a bandit algorithm in a given environment can be transferred to improve the performance in other similar environments.
Document type :
Theses
Complete list of metadatas

Cited literature [75 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/tel-01249224
Contributor : Marta Soare <>
Submitted on : Wednesday, December 30, 2015 - 5:21:14 PM
Last modification on : Thursday, February 21, 2019 - 10:52:49 AM
Long-term archiving on : Tuesday, April 5, 2016 - 1:48:22 PM

Identifiers

  • HAL Id : tel-01249224, version 1

Citation

Marta Soare. Sequential Resource Allocation in Linear Stochastic Bandits . Machine Learning [cs.LG]. Université Lille 1 - Sciences et Technologies, 2015. English. ⟨tel-01249224⟩

Share

Metrics

Record views

515

Files downloads

439