Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithm

Debabrota Basu; Odalric-Ambrym Maillard; Timothée Mathieu

Pré-Publication, Document De Travail Année : 2022

Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithm

(1) , (1) , (1)

Debabrota Basu

Fonction : Auteur
PersonId : 742129
IdHAL : debabrota-basu

Scool

Odalric-Ambrym Maillard

Fonction : Auteur
PersonId : 5563
IdHAL : odalric-ambrym-maillard
ORCID : 0000-0001-7935-7026
IdRef : 158055594

Scool

Timothée Mathieu

Fonction : Auteur
PersonId : 1130096
IdHAL : timothee-mathieu

Scool

Résumé

In this paper, we study the stochastic bandits problem with k unknown heavy-tailed and corrupted reward distributions or arms with time-invariant corruption distributions. At each iteration, the player chooses an arm. Given the arm, the environment returns an uncorrupted reward with probability 1−ε and an arbitrarily corrupted reward with probability ε. In our setting, the uncorrupted reward might be heavy-tailed and the corrupted reward might be unbounded. We prove a lower bound on the regret indicating that the corrupted and heavy-tailed bandits are strictly harder than uncorrupted or light-tailed bandits. We observe that the environments can be categorised into hardness regimes depending on the suboptimality gap ∆, variance σ, and corruption proportion ϵ. Following this, we design a UCB-type algorithm, namely HuberUCB, that leverages Huber's estimator for robust mean estimation. HuberUCB leads to tight upper bounds on regret in the proposed corrupted and heavy-tailed setting. To derive the upper bound, we prove a novel concentration inequality for Huber's estimator, which might be of independent interest.

Mots clés

Unbounded corruption Heavy-tail distributions Huber's estimator Regret bounds

Domaines

Mathématiques [math] Machine Learning [stat.ML]

Fichier principal

main.pdf (3.18 Mo)

article.zip (2.85 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Timothée Mathieu : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03611816

Soumis le : jeudi 17 mars 2022-17:18:33

Dernière modification le : lundi 22 avril 2024-13:52:19

Dates et versions

hal-03611816 , version 1 (17-03-2022)

Identifiants

HAL Id : hal-03611816 , version 1

Citer

Debabrota Basu, Odalric-Ambrym Maillard, Timothée Mathieu. Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithm. 2022. ⟨hal-03611816⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA CRISTAL INRIA2 UNIV-LILLE CRISTAL-SCOOL

33 Consultations

75 Téléchargements

Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithm

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager