Solving Hidden-Semi-Markov-Mode Markov Decision Problems

Emmanuel Hadoux 1 Aurélie Beynier 1 Paul Weng 2
1 SMA - Systèmes Multi-Agents
LIP6 - Laboratoire d'Informatique de Paris 6
2 DECISION
LIP6 - Laboratoire d'Informatique de Paris 6
Abstract : Hidden-Mode Markov Decision Processes (HM-MDPs) were proposed to represent sequential decision-making problems in non-stationary environments that evolve according to a Markov chain. We introduce in this paper Hidden-Semi-Markov-Mode Markov Decision Processes (HS3MDPs), a generalization of HM-MDPs to the more realistic case of non-stationary environments evolving according to a semi-Markov chain. Like HM-MDPs, HS3MDPs form a subclass of Partially Observable Markov Decision Processes. Therefore, large instances of HS3MDPs (and HM-MDPs) can be solved using an online algorithm, the Partially Observable Monte Carlo Planning (POMCP) algorithm, based on Monte Carlo Tree Search enhanced with particle filters for belief state approximation. We propose a first adaptation of POMCP to solve HS3MDPs more efficiently by exploiting their structure. Our empirical results show that the first adapted POMCP reaches higher cumulative rewards than the original POMCP algorithm. However, in larger instances, POMCP may run out of particles. To solve this issue, we propose a second adaptation of POMCP, replacing particle filters by exact representations of beliefs. Our empirical results indicate that this new version reaches high cumulative rewards faster than the former adapted POMCP and still remains efficient even for large problems.
Document type :
Conference papers
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01215264
Contributor : Lip6 Publications <>
Submitted on : Tuesday, October 13, 2015 - 5:48:20 PM
Last modification on : Thursday, March 21, 2019 - 12:59:07 PM

Identifiers

  • HAL Id : hal-01215264, version 1

Citation

Emmanuel Hadoux, Aurélie Beynier, Paul Weng. Solving Hidden-Semi-Markov-Mode Markov Decision Problems. AAMAS Workshop Adaptative Learning Agents, ALA 2014, May 2014, Paris, France. ⟨hal-01215264⟩

Share

Metrics

Record views

78