Distribution-based objectives for Markov Decision Processes

S. Akshay; Blaise Genest; Nikhil Vyas

doi:10.1145/3209108.3209185

Communication Dans Un Congrès Année : 2018

Distribution-based objectives for Markov Decision Processes

(1) , (2, 3) , (4)

1
2
3
4

S. Akshay

Fonction : Auteur
PersonId : 915602
ORCID : 0000-0002-2471-5997

Indian Institute of Technology Bombay

Blaise Genest

Fonction : Auteur
PersonId : 864401

Université de Rennes

SUpervision of large MOdular and distributed systems

Nikhil Vyas

Fonction : Auteur

Massachusetts Institute of Technology

Résumé

We consider distribution-based objectives for Markov Decision Processes (MDP). This class of objectives gives rise to an interesting trade-off between full and partial information. As in full observation, the strategy in the MDP can depend on the state of the system, but similar to partial information, the strategy needs to account for all the states at the same time. In this paper, we focus on two safety problems that arise naturally in this context, namely, existential and universal safety. Given an MDP A and a closed and convex polytope H of probability distributions over the states of A, the existential safety problem asks whether there exists some distribution ∆ in H and a strategy of A, such that starting from ∆ and repeatedly applying this strategy keeps the distribution forever in H. The universal safety problem asks whether for all distributions in H , there exists such a strategy of A which keeps the distribution forever in H. We prove that both problems are decidable, with tight complexity bounds: we show that existential safety is PTIME-complete, while universal safety is co-NP-complete. Further, we compare these results with existential and universal safety problems for Rabin's probabilistic finite-state automata (PFA), the subclass of Partially Observable MDPs which have zero observation. Compared to MDPs, strategies of PFAs are not state-dependent. In sharp contrast to the PTIME result, we show that existential safety for PFAs is undecidable, with H having closed and open boundaries. On the other hand, it turns out that the universal safety for PFAs is decidable in EXPTIME, with a co-NP lower bound. Finally, we show that an alternate representation of the input poly-tope allows us to improve the complexity of universal safety for MDPs and PFAs.

Domaines

Autre [cs.OH]

Fichier principal

AGV18.pdf (844.01 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Blaise Genest : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01933978

Soumis le : mercredi 6 novembre 2019-02:22:55

Dernière modification le : mercredi 23 août 2023-15:34:52

Archivage à long terme le : vendredi 7 février 2020-21:05:34

Dates et versions

hal-01933978 , version 1 (06-11-2019)

Identifiants

HAL Id : hal-01933978 , version 1
DOI : 10.1145/3209108.3209185

Citer

S. Akshay, Blaise Genest, Nikhil Vyas. Distribution-based objectives for Markov Decision Processes. LICS 2018, the 33rd Annual ACM/IEEE Symposium, Jul 2018, Oxford, United Kingdom. pp.36-45, ⟨10.1145/3209108.3209185⟩. ⟨hal-01933978⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INRIA INSA-RENNES IRISA CENTRALESUPELEC INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES ANR UR1-MATH-NUM

70 Consultations

126 Téléchargements

Distribution-based objectives for Markov Decision Processes

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager