Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

Pathwise uniform value in gambling houses and Partially Observable Markov Decision Processes

Abstract : In several standard models of dynamic programming (gambling houses, MDPs, POMDPs), we prove the existence of a robust notion of value for the infinitely repeated problem, namely the pathwise uniform value. This solves two open problems. First, this shows that for any ǫ > 0, the decision-maker has a pure strategy σ which is ǫ-optimal in any n-stage game, provided that n is big enough (this result was only known for behavior strategies, that is, strategies which use randomization). Second, the strategy σ can be chosen such that under the long-run average payoff criterion, the decision-maker has more than the limit of the n-stage values.
Complete list of metadata

Cited literature [20 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01302567
Contributor : Xavier Venel Connect in order to contact the contributor
Submitted on : Thursday, April 14, 2016 - 3:39:17 PM
Last modification on : Friday, April 29, 2022 - 10:13:03 AM
Long-term archiving on: : Friday, July 15, 2016 - 1:11:04 PM

File

1505.07495v2.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01302567, version 1
  • ARXIV : 1505.07495

Relations

Citation

Xavier Venel, Bruno Ziliotto. Pathwise uniform value in gambling houses and Partially Observable Markov Decision Processes. 2016. ⟨hal-01302567⟩

Share

Metrics

Record views

155

Files downloads

50