Evaluation of stochastic policies for factored Markov decision processes - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2013

Evaluation of stochastic policies for factored Markov decision processes

Résumé

We are interested in the resolution of general Markov decision processes (MDP) with factored state and action spaces, called FA-FMDP [4]. We are in particular interested in reward functions which constrain the system to be in an admissible domain, defined by several conditions, the longest time as possible. There are few algorithms able to resolve FA-FMDPs with both a reasonable complexity and a reasonable quality of approximation. Some recent algorithms described in [9], can be used with affine algebraic decision diagrams (AADDs), which are more suited to multiplicative rewards than algebraic decision diagrams (ADDs). The drawback of these algorithms is that they are designed for binary state and action variables, and do not scale to variables with more than two modalities. Most of the other existing methods for solving MDPs with large state and action spaces make the assumption of an additive reward, like approximate linear programming [3] or mean field approaches [10]. As an alternative, we propose to consider multiplicative rewards as an interesting way of modelling objectives defined as admissible domains, and because this trick, as we will see, allows to find methods of approximate policy evaluation. Recently, several decision problems have been resolved with methods of inference in graphical models, an idea which has been recently called planning as inference [2]. We may cite for example [11] which proposes an EM algorithm for solving (non factored) MDPs or [6] which proposes a belief propagation algorithm for solving influence diagrams. Our idea is to follow this trend and propose an (approximate) policy iteration type algorithm [8] for solving FAFMDPs with multiplicative rewards. This algorithm iterates over two steps : an evaluation step and an optimization step. In this communication, we focus on the approximate evaluation step of stochastic policies for FA-FMDPs with multiplicative rewards. The method we propose is based on the computation of normalization constants of factor graphs with increasing sizes by existing variational or belief propagation methods which are applicable on large size graphs.
Fichier principal
Vignette du fichier
2013-358.pdf (126.69 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01005047 , version 1 (11-06-2014)

Identifiants

  • HAL Id : hal-01005047 , version 1
  • PRODINRA : 262199

Citer

Julia J. Radoszycki, Nathalie N. Dubois Peyrard, Régis Sabbadin. Evaluation of stochastic policies for factored Markov decision processes. 3. workshop AIGM : Algorithmic issues for inference in graphical models, Sep 2013, Paris, France. ⟨hal-01005047⟩
109 Consultations
73 Téléchargements

Partager

Gmail Facebook X LinkedIn More