Skip to Main content Skip to Navigation
Conference papers

Routine Bandits: Minimizing Regret on Recurring Problems

Hassan Saber 1 Léo Saci 2 Odalric-Ambrym Maillard 1 Audrey Durand 3 
1 Scool - Scool
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189
Abstract : We study a variant of the multi-armed bandit problem in which a learner faces every day one of B many bandit instances, and call it a routine bandit. More specifically, at each period h ∈ [1, H] , the same bandit b^h is considered during T > 1 consecutive time steps, but the identity b^h is unknown to the learner. We assume all rewards distribution are Gaussian standard. Such a situation typically occurs in recommender systems when a learner may repeatedly serve the same user whose identity is unknown due to privacy issues. By combining banditidentification tests with a KLUCB type strategy, we introduce the KLUCB for Routine Bandits (KLUCB-RB) algorithm. While independently running KLUCB algorithm at each period leads to a cumulative expected regret of Ω(H log T) after H many periods when T → ∞, KLUCB-RB benefits from previous periods by aggregating observations from similar identified bandits, which yields a non-trivial scaling of Ω(log T). This is achieved without knowing which bandit instance is being faced by KLUCB-RB on this period, nor knowing a priori the number of possible bandit instances. We provide numerical illustration that confirm the benefit of KLUCB-RB while using less information about the problem compared with existing strategies for similar problems.
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03286539
Contributor : Hassan SABER Connect in order to contact the contributor
Submitted on : Thursday, September 9, 2021 - 1:50:03 PM
Last modification on : Wednesday, September 7, 2022 - 8:14:05 AM
Long-term archiving on: : Friday, December 10, 2021 - 6:02:12 PM

Files

ECML2021_RoutineBandits (Camer...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-03286539, version 1

Collections

Citation

Hassan Saber, Léo Saci, Odalric-Ambrym Maillard, Audrey Durand. Routine Bandits: Minimizing Regret on Recurring Problems. ECML-PKDD 2021, Sep 2021, Bilbao, Spain. ⟨hal-03286539⟩

Share

Metrics

Record views

79

Files downloads

82