Evaluation of stochastic policies for factored Markov decision processes

Julia J. Radoszycki; Nathalie N. Dubois Peyrard; Régis Sabbadin

Communication Dans Un Congrès Année : 2013

Evaluation of stochastic policies for factored Markov decision processes

(1, 2) , (1) , (1)

1
2

Julia J. Radoszycki

Fonction : Auteur

Unité de Biométrie et Intelligence Artificielle (ancêtre de MIAT)

Agroécologie [Dijon]

Nathalie N. Dubois Peyrard

Fonction : Auteur
PersonId : 736895
IdHAL : nathalie-peyrard
ORCID : 0000-0002-0356-1255
IdRef : 060568283

Unité de Biométrie et Intelligence Artificielle (ancêtre de MIAT)

Régis Sabbadin

Fonction : Auteur
PersonId : 736236
IdHAL : regis-sabbadin
ORCID : 0000-0002-6286-1821
IdRef : 133261395

Unité de Biométrie et Intelligence Artificielle (ancêtre de MIAT)

Résumé

We are interested in the resolution of general Markov decision processes (MDP) with factored state and action spaces, called FA-FMDP [4]. We are in particular interested in reward functions which constrain the system to be in an admissible domain, defined by several conditions, the longest time as possible. There are few algorithms able to resolve FA-FMDPs with both a reasonable complexity and a reasonable quality of approximation. Some recent algorithms described in [9], can be used with affine algebraic decision diagrams (AADDs), which are more suited to multiplicative rewards than algebraic decision diagrams (ADDs). The drawback of these algorithms is that they are designed for binary state and action variables, and do not scale to variables with more than two modalities. Most of the other existing methods for solving MDPs with large state and action spaces make the assumption of an additive reward, like approximate linear programming [3] or mean field approaches [10]. As an alternative, we propose to consider multiplicative rewards as an interesting way of modelling objectives defined as admissible domains, and because this trick, as we will see, allows to find methods of approximate policy evaluation. Recently, several decision problems have been resolved with methods of inference in graphical models, an idea which has been recently called planning as inference [2]. We may cite for example [11] which proposes an EM algorithm for solving (non factored) MDPs or [6] which proposes a belief propagation algorithm for solving influence diagrams. Our idea is to follow this trend and propose an (approximate) policy iteration type algorithm [8] for solving FAFMDPs with multiplicative rewards. This algorithm iterates over two steps : an evaluation step and an optimization step. In this communication, we focus on the approximate evaluation step of stochastic policies for FA-FMDPs with multiplicative rewards. The method we propose is based on the computation of normalization constants of factor graphs with increasing sizes by existing variational or belief propagation methods which are applicable on large size graphs.

Domaines

Sciences agricoles

Fichier principal

2013-358.pdf (126.69 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Archive Ouverte ProdInra : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01005047

Soumis le : mercredi 11 juin 2014-20:04:21

Dernière modification le : mardi 12 mars 2024-10:44:26

Archivage à long terme le : jeudi 11 septembre 2014-13:04:22

Dates et versions

hal-01005047 , version 1 (11-06-2014)

Identifiants

HAL Id : hal-01005047 , version 1
PRODINRA : 262199

Citer

Julia J. Radoszycki, Nathalie N. Dubois Peyrard, Régis Sabbadin. Evaluation of stochastic policies for factored Markov decision processes. 3. workshop AIGM : Algorithmic issues for inference in graphical models, Sep 2013, Paris, France. ⟨hal-01005047⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-BOURGOGNE INRA INRAE MATHNUM AGROECOLOGIE MIAT

109 Consultations

73 Téléchargements

Evaluation of stochastic policies for factored Markov decision processes

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager