Interval Iteration Algorithm for MDPs and IMDPs

Serge Haddad; Benjamin Monmege

doi:10.1016/j.tcs.2016.12.003

Article Dans Une Revue Theoretical Computer Science Année : 2018

Interval Iteration Algorithm for MDPs and IMDPs

(1, 2) , (3)

1
2
3

Serge Haddad

Fonction : Auteur
PersonId : 745039
IdHAL : serge-haddad
ORCID : 0000-0002-1759-1201
IdRef : 032657501

Modeling and Exploitation of Interaction and Concurrency

Université Paris-Saclay

Benjamin Monmege

Fonction : Auteur
PersonId : 5434
IdHAL : benjamin-monmege
ORCID : 0000-0002-4717-9955
IdRef : 176828885

Modélisation et Vérification

Résumé

Markov Decision Processes (MDP) are a widely used model including both non-deterministic and probabilistic choices. Minimal and maximal probabilities to reach a target set of states, with respect to a policy resolving non-determinism, may be computed by several methods including value iteration. This algorithm, easy to implement and efficient in terms of space complexity, iteratively computes the probabilities of paths of increasing length. However, it raises three issues: (1) defining a stopping criterion ensuring a bound on the approximation, (2) analysing the rate of convergence, and (3) specifying an additional procedure to obtain the exact values once a sufficient number of iterations has been performed. The first two issues are still open and, for the third one, an upper bound on the number of iterations has been proposed. Based on a graph analysis and transformation of MDPs, we address these problems. First we introduce an interval iteration algorithm, for which the stopping criterion is straightforward. Then we exhibit its convergence rate. Finally we significantly improve the upper bound on the number of iterations required to get the exact values. We extend our approach to also deal with Interval Markov Decision Processes (IMDP) that can be seen as symbolic representations of MDPs.

Mots clés

Markov decision processes value iteration stochastic verification

Domaines

Logique en informatique [cs.LO] Informatique et théorie des jeux [cs.GT] Informatique et langage [cs.CL] Théorie et langage formel [cs.FL]

Fichier principal

tcs-version.pdf (488.24 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Benjamin Monmege : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01809094

Soumis le : mercredi 6 juin 2018-16:55:56

Dernière modification le : vendredi 22 mars 2024-18:24:04

Archivage à long terme le : vendredi 7 septembre 2018-13:44:16

Dates et versions

hal-01809094 , version 1 (06-06-2018)

Identifiants

HAL Id : hal-01809094 , version 1
DOI : 10.1016/j.tcs.2016.12.003

Citer

Serge Haddad, Benjamin Monmege. Interval Iteration Algorithm for MDPs and IMDPs. Theoretical Computer Science, 2018, 735, pp.111 - 131. ⟨10.1016/j.tcs.2016.12.003⟩. ⟨hal-01809094⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-TLN CNRS INRIA UNIV-AMU ENS-CACHAN INRIA2 UNIV-PARIS-SACLAY LIS-LAB MOVE ENS-PARIS-SACLAY GS-COMPUTER-SCIENCE LMF

275 Consultations

879 Téléchargements

Interval Iteration Algorithm for MDPs and IMDPs

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager