Secure Best Arm Identification in Multi-Armed Bandits

Abstract : The stochastic multi-armed bandit is a classical decision making model, where an agent repeatedly chooses an action (pull a bandit arm) and the environment responds with a stochastic outcome (reward) coming from an unknown distribution associated with the chosen action. A popular objective for the agent is that of identifying the arm with the maximum expected reward, also known as the best-arm identification problem. We address the inherent privacy concerns that occur in a best-arm identification problem when outsourcing the data and computations to a honest-but-curious cloud. Our main contribution is a distributed protocol that computes the best arm while guaranteeing that (i) no cloud node can learn at the same time information about the rewards and about the arms ranking, and (ii) by analyzing the messages communicated between the different cloud nodes, no information can be learned about the rewards or about the ranking. In other words, the two properties ensure that the protocol has no security single point of failure. We rely on the partially homomorphic property of the well-known Paillier's cryptosystem as a building block in our protocol. We prove the correctness of our protocol and we present proof-of-concept experiments suggesting its practical feasibility.
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-02270418
Contributor : Radu Ciucanu <>
Submitted on : Sunday, August 25, 2019 - 3:00:51 PM
Last modification on : Saturday, September 14, 2019 - 1:43:23 AM

Identifiers

  • HAL Id : hal-02270418, version 1

Citation

Radu Ciucanu, Pascal Lafourcade, Marius Lombard-Platet, Marta Soare. Secure Best Arm Identification in Multi-Armed Bandits. ISPEC 2019 : The 15th International Conference on Information Security Practice and Experience, Nov 2019, Kuala Lumpur, Malaysia. ⟨hal-02270418⟩

Share

Metrics

Record views

151