Mean field approximation of the policy iteration algorithm for graph-based Markov decision processes - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2006

Mean field approximation of the policy iteration algorithm for graph-based Markov decision processes

Résumé

In this article, we consider a compact representation of multidimensional Markov Decision Processes based on Graphs (GMDP). The states and actions of a GMDP are multidimensional and attached to the vertices of a graph allowing the representation of local dynamics and rewards. This approach is in the line of approaches based on Dynamic Bayesian Networks. For policy optimisation, a direct application of the Policy Iteration algorithm, of exponential complexity in the number of nodes of the graph, is not possible for such high dimensional problems and we propose an approximate version of this algorithm derived from the GMDP representation. We do not try to approximate directly the value function, as usually done, but we rather propose an approximation of the occupation measure of the model, based on the mean field principle. Then, we use it to compute the value function and derive approximate policy evaluation and policy improvement methods. Their combination yields an approximate Policy Iteration algorithm of linear complexity in terms of the number of nodes of the graph. Comparisons with the optimal solution, when available, and with a naive short-term policy demonstrate the quality of the proposed procedure
Fichier non déposé

Dates et versions

hal-02755289 , version 1 (03-06-2020)

Identifiants

  • HAL Id : hal-02755289 , version 1
  • PRODINRA : 50323

Citer

Nathalie Dubois Peyrard Peyrard, Régis Sabbadin. Mean field approximation of the policy iteration algorithm for graph-based Markov decision processes. 17. European Conference on Artificial Intelligence, Aug 2006, Riva del Garda, Italy. ⟨hal-02755289⟩
7 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More