Mean field approximation of the policy iteration algorithm for graph-based Markov decision processes

Nathalie Dubois Peyrard Peyrard; Régis Sabbadin

Communication Dans Un Congrès Année : 2006

Mean field approximation of the policy iteration algorithm for graph-based Markov decision processes

(1) , (2)

1
2

Nathalie Dubois Peyrard Peyrard

Fonction : Auteur
PersonId : 736895
IdHAL : nathalie-peyrard
ORCID : 0000-0002-0356-1255
IdRef : 060568283

Biostatistique et Processus Spatiaux

Régis Sabbadin

Fonction : Auteur
PersonId : 736236
IdHAL : regis-sabbadin
ORCID : 0000-0002-6286-1821
IdRef : 133261395

Unité de Biométrie et Intelligence Artificielle (ancêtre de MIAT)

Résumé

In this article, we consider a compact representation of multidimensional Markov Decision Processes based on Graphs (GMDP). The states and actions of a GMDP are multidimensional and attached to the vertices of a graph allowing the representation of local dynamics and rewards. This approach is in the line of approaches based on Dynamic Bayesian Networks. For policy optimisation, a direct application of the Policy Iteration algorithm, of exponential complexity in the number of nodes of the graph, is not possible for such high dimensional problems and we propose an approximate version of this algorithm derived from the GMDP representation. We do not try to approximate directly the value function, as usually done, but we rather propose an approximation of the occupation measure of the model, based on the mean field principle. Then, we use it to compute the value function and derive approximate policy evaluation and policy improvement methods. Their combination yields an approximate Policy Iteration algorithm of linear complexity in terms of the number of nodes of the graph. Comparisons with the optimal solution, when available, and with a naive short-term policy demonstrate the quality of the proposed procedure

Mots clés

Markov decision processes

artificial intelligence INTELLIGENCE ARTIFICIELLE processus de décision markovien

Domaines

Mathématiques [math] Informatique [cs]

Migration ProdInra : Connectez-vous pour contacter le contributeur

https://hal.inrae.fr/hal-02755289

Soumis le : mercredi 3 juin 2020-21:35:55

Dernière modification le : jeudi 14 mars 2024-03:13:42

Dates et versions

hal-02755289 , version 1 (03-06-2020)

Identifiants

HAL Id : hal-02755289 , version 1
PRODINRA : 50323

Citer

Nathalie Dubois Peyrard Peyrard, Régis Sabbadin. Mean field approximation of the policy iteration algorithm for graph-based Markov decision processes. 17. European Conference on Artificial Intelligence, Aug 2006, Riva del Garda, Italy. ⟨hal-02755289⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INRA INRAE MATHNUM MIAT INRAEPACA

7 Consultations

0 Téléchargements

Mean field approximation of the policy iteration algorithm for graph-based Markov decision processes

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager