Efficient Learning in Stochastic Combinatorial Semi-Bandits

2 Scool - Scool
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189
Abstract : Combinatorial stochastic semi-bandits appear naturally in many contexts where the exploration/exploitation dilemma arises, such as web content optimization (recommendation/online advertising) or shortest path routing methods. This problem is formulated as follows: an agent sequentially optimizes an unknown and noisy objective function, defined on a power set $\mathcal{P}([n])$. For each set $A$ tried out, the agent suffers a loss equal to the expected deviation from the optimal solution while obtaining observations to reduce its uncertainty on the coordinates from $A$. Our objective is to study the efficiency of policies for this problem, focusing in particular on the following two aspects: statistical efficiency, where the criterion considered is the regret suffered by the policy (the cumulative loss) that measures learning performance; and computational efficiency. It is sometimes difficult to combine these two aspects in a single policy. In this thesis, we propose different directions for improving statistical efficiency, while trying to maintain the computational efficiency of policies. In particular, we have improved optimistic methods by developing approximation algorithms and refining the confidence regions used. We also explored an alternative to the optimistic methods, namely randomized methods, and found them to be a serious candidate for combining the two types of efficiency.
Keywords :
Document type :
Theses
Domain :

https://tel.archives-ouvertes.fr/tel-03093268
Contributor : Pierre Perrault Connect in order to contact the contributor
Submitted on : Sunday, January 3, 2021 - 7:07:34 PM
Last modification on : Friday, January 21, 2022 - 3:12:39 AM
Long-term archiving on: : Sunday, April 4, 2021 - 6:25:48 PM

File

phd.pdf
Files produced by the author(s)

Identifiers

• HAL Id : tel-03093268, version 1

Citation

Pierre Perrault. Efficient Learning in Stochastic Combinatorial Semi-Bandits. Mathematics [math]. Univeristé Paris-Saclay, 2020. English. ⟨tel-03093268⟩

Metrics

Les métriques sont temporairement indisponibles