Interval Iteration Algorithm for MDPs and IMDPs

Serge Haddad 1, 2 Benjamin Monmege 3
1 MEXICO - Modeling and Exploitation of Interaction and Concurrency
LSV - Laboratoire Spécification et Vérification [Cachan], ENS Cachan - École normale supérieure - Cachan, Inria Saclay - Ile de France, CNRS - Centre National de la Recherche Scientifique : UMR8643
3 MOVE - Modélisation et Vérification
LIS - Laboratoire d'Informatique et Systèmes
Abstract : Markov Decision Processes (MDP) are a widely used model including both non-deterministic and probabilistic choices. Minimal and maximal probabilities to reach a target set of states, with respect to a policy resolving non-determinism, may be computed by several methods including value iteration. This algorithm, easy to implement and efficient in terms of space complexity, iteratively computes the probabilities of paths of increasing length. However, it raises three issues: (1) defining a stopping criterion ensuring a bound on the approximation, (2) analysing the rate of convergence, and (3) specifying an additional procedure to obtain the exact values once a sufficient number of iterations has been performed. The first two issues are still open and, for the third one, an upper bound on the number of iterations has been proposed. Based on a graph analysis and transformation of MDPs, we address these problems. First we introduce an interval iteration algorithm, for which the stopping criterion is straightforward. Then we exhibit its convergence rate. Finally we significantly improve the upper bound on the number of iterations required to get the exact values. We extend our approach to also deal with Interval Markov Decision Processes (IMDP) that can be seen as symbolic representations of MDPs.
Liste complète des métadonnées

Littérature citée [13 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01809094
Contributeur : Benjamin Monmege <>
Soumis le : mercredi 6 juin 2018 - 16:55:56
Dernière modification le : jeudi 7 février 2019 - 16:04:36
Document(s) archivé(s) le : vendredi 7 septembre 2018 - 13:44:16

Fichier

tcs-version.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Serge Haddad, Benjamin Monmege. Interval Iteration Algorithm for MDPs and IMDPs. Theoretical Computer Science, Elsevier, 2018, 735, pp.111 - 131. 〈10.1016/j.tcs.2016.12.003〉. 〈hal-01809094〉

Partager

Métriques

Consultations de la notice

154

Téléchargements de fichiers

69