Sequential Test for the Lowest Mean: From Thompson to Murphy Sampling

Abstract : Learning the minimum/maximum mean among a finite set of distributions is a fundamental sub-task in planning, game tree search and reinforcement learning. We formalize this learning task as the problem of sequentially testing how the minimum mean among a finite set of distributions compares to a given threshold. We develop refined non-asymptotic lower bounds, which show that optimality mandates very different sampling behavior for a low vs high true minimum. We show that Thompson Sampling and the intuitive Lower Confidence Bounds policy each nail only one of these cases. We develop a novel approach that we call Murphy Sampling. Even though it entertains exclusively low true minima, we prove that MS is optimal for both possibilities. We then design advanced self-normalized deviation inequalities, fueling more aggressive stopping rules. We complement our theoretical guarantees by experiments showing that MS works best in practice.
Document type :
Conference papers
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01804581
Contributor : Emilie Kaufmann <>
Submitted on : Thursday, May 31, 2018 - 6:42:17 PM
Last modification on : Tuesday, April 16, 2019 - 5:13:47 PM
Long-term archiving on : Saturday, September 1, 2018 - 3:01:48 PM

Files

Identifiers

  • HAL Id : hal-01804581, version 1
  • ARXIV : 1806.00973

Citation

Emilie Kaufmann, Wouter Koolen, Aurélien Garivier. Sequential Test for the Lowest Mean: From Thompson to Murphy Sampling. Advances in Neural Information Processing Systems (NIPS), Dec 2018, Montréal, Canada. ⟨hal-01804581⟩

Share

Metrics

Record views

183

Files downloads

72