Ordinal Decision Models for Markov Decision Processes

Paul Weng 1
LIP6 - Laboratoire d'Informatique de Paris 6
Abstract : Setting the values of rewards in Markov decision processes (MDP) may be a difficult task. In this paper, we consider two ordinal decision models for MDPs where only an order is known over rewards. The first one, which has been proposed recently in MDPs [23], defines preferences with respect to a reference point. The second model, which can been viewed as the dual approach of the first one, is based on quantiles. Based on the first decision model, we give a new interpretation of rewards in standard MDPs, which sheds some interesting light on the preference system used in standard MDPs. The second model based on quantile optimization is a new approach in MDPs with ordinal rewards. Although quantile-based optimality is state-dependent, we prove that an optimal stationary deterministic policy exists for a given initial state. Finally, we propose solution methods based on linear programming for optimizing quantiles.
Document type :
Conference papers
Liste complète des métadonnées

Contributor : Lip6 Publications <>
Submitted on : Thursday, February 11, 2016 - 5:22:16 PM
Last modification on : Thursday, March 21, 2019 - 12:59:40 PM



Paul Weng. Ordinal Decision Models for Markov Decision Processes. European Conference on Artificial Intelligence, Aug 2012, Montpellier, France. IOS Press, European Conference on Artificial Intelligence, 242, pp.828-833, Frontiers in Artificial Intelligence and Applications. 〈10.3233/978-1-61499-098-7-828〉. 〈hal-01273056〉



Record views