Efficient Learning in Stochastic Combinatorial Semi-Bandits

Pierre Perrault

Thèse Année : 2020

Efficient Learning in Stochastic Combinatorial Semi-Bandits

Apprentissage Efficient dans les Problèmes de Semi-Bandits Stochastiques Combinatoires

(1, 2)

1
2

Pierre Perrault

Fonction : Auteur
PersonId : 1073476

Ecole Normale Supérieure Paris-Saclay

Scool

Résumé

Combinatorial stochastic semi-bandits appear naturally in many contexts where the exploration/exploitation dilemma arises, such as web content optimization (recommendation/online advertising) or shortest path routing methods. This problem is formulated as follows: an agent sequentially optimizes an unknown and noisy objective function, defined on a power set $\mathcal{P}([n])$. For each set $A$ tried out, the agent suffers a loss equal to the expected deviation from the optimal solution while obtaining observations to reduce its uncertainty on the coordinates from $A$. Our objective is to study the efficiency of policies for this problem, focusing in particular on the following two aspects: statistical efficiency, where the criterion considered is the regret suffered by the policy (the cumulative loss) that measures learning performance; and computational efficiency. It is sometimes difficult to combine these two aspects in a single policy. In this thesis, we propose different directions for improving statistical efficiency, while trying to maintain the computational efficiency of policies. In particular, we have improved optimistic methods by developing approximation algorithms and refining the confidence regions used. We also explored an alternative to the optimistic methods, namely randomized methods, and found them to be a serious candidate for combining the two types of efficiency.

Les problèmes de semi-bandits stochastiques combinatoires se présentent naturellement dans de nombreux contextes où le dilemme exploration/exploitation se pose, tels que l’optimisation de contenu web (recommandation/publicité en ligne) ou encore les méthodes de routage à trajet minimal. Ce problème est formulé de la manière suivante : un agent optimise séquentiellement une fonction objectif inconnue et bruitée, définie sur un ensemble puissance $\mathcal{P}([n])$. Pour chaque ensemble $A$ testé, l'agent subit une perte égale à l'écart espéré par rapport à la solution optimale tout en obtenant des observations lui permettant de réduire son incertitude sur les coordonnées de $A$. Notre objectif est d'étudier l'efficience des politiques pour ce problème, en nous intéressant notamment aux deux aspects suivants : l'efficience statistique, où le critère considéré est le regret subi par la politique (la perte cumulée) qui mesure la performance d'apprentissage ; et l'efficience computationnelle (i.e., de calcul). Il est parfois difficile de réunir ces deux aspects dans une seule politique. Dans cette thèse, nous proposons différentes directions pour améliorer l'efficience statistique, tout en essayant de maintenir l'efficience computationnelle des politiques. Nous avons notamment amélioré les méthodes optimistes en développant des algorithmes d'approximation et en affinant les régions de confiance utilisées. Nous avons également exploré une alternative aux méthodes optimistes, à savoir les méthodes randomisées, et avons constaté qu'elles constituent un candidat sérieux pour pouvoir réunir les deux types d'efficience.

Mots clés

Combinatorial bandit confidence regions computational efficiency

Bandit combinatoire régions de confiance efficience computationnelle

Domaines

Mathématiques [math] Probabilités [math.PR] Machine Learning [stat.ML] Informatique [cs] Statistiques [stat] Recherche opérationnelle [math.OC]

Fichier principal

phd.pdf (3.51 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

pierre perrault : Connectez-vous pour contacter le contributeur

https://theses.hal.science/tel-03093268

Soumis le : dimanche 3 janvier 2021-19:07:34

Dernière modification le : jeudi 25 avril 2024-03:18:01

Archivage à long terme le : dimanche 4 avril 2021-18:25:48

Dates et versions

tel-03093268 , version 1 (03-01-2021)

Identifiants

HAL Id : tel-03093268 , version 1

Citer

Pierre Perrault. Efficient Learning in Stochastic Combinatorial Semi-Bandits. Mathematics [math]. Univeristé Paris-Saclay, 2020. English. ⟨NNT : ⟩. ⟨tel-03093268⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA ENS-CACHAN CRISTAL INRIA2 TDS-MACS UNIV-LILLE CRISTAL-SCOOL ENS-PARIS-SACLAY

219 Consultations

208 Téléchargements

Efficient Learning in Stochastic Combinatorial Semi-Bandits

Apprentissage Efficient dans les Problèmes de Semi-Bandits Stochastiques Combinatoires

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager