Secure Cumulative Reward Maximization in Linear Stochastic Bandits - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

Secure Cumulative Reward Maximization in Linear Stochastic Bandits

Résumé

The linear stochastic multi-armed bandit is a sequential learning setting, where, at each round, a learner chooses an arm and receives a stochastic reward based on an unknown linear function of the chosen arm. The goal is to collect as much reward as possible. Linear bandits have popular applications such as online recommendation based on user preferences, where obtaining a high reward means recommending an item with high expected rating. We address the security concerns that occur when outsourcing the data and the cumulative reward maximization algorithm to an honest-but-curious cloud. We propose LinUCB-DS, a distributed and secure protocol that achieves the same cumulative reward as the standard LinUCB algorithm, without disclosing to the cloud the linear function used to draw arm rewards. We formally prove the complexity and security properties of LinUCB-DS. We also show that LinUCB-DS can be easily adapted to secure the SpectralUCB algorithm, which improves LinUCB for a class of linear bandits. We show the feasibility of our protocols via a proof-of-concept experimental study using the MovieLens movie recommendation dataset.
Fichier non déposé

Dates et versions

hal-02942694 , version 1 (18-09-2020)

Identifiants

Citer

Radu Ciucanu, Anatole Delabrouille, Pascal Lafourcade, Marta Soare. Secure Cumulative Reward Maximization in Linear Stochastic Bandits. International Conference on Provable and Practical Security (ProvSec), Nov 2020, Conférence online, Singapore. pp.257-277, ⟨10.1007/978-3-030-62576-4_13⟩. ⟨hal-02942694⟩
176 Consultations
4 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More