Markov Decision Processes with Ordinal Rewards: Reference Point-Based Preferences

Paul Weng 1
1 DECISION
LIP6 - Laboratoire d'Informatique de Paris 6
Abstract : In a standard Markov decision process (MDP), rewards are assumed to be precisely known and of quantitative nature. This can be a too strong hypothesis in some situations. When rewards can really be modeled numerically, specifying the reward function is often difficult as it is a cognitively-demanding and/or time-consuming task. Besides, rewards can sometimes be of qualitative nature as when they represent qualitative risk levels for instance. In those cases, it is problematic to use directly standard MDPs and we propose instead to resort to MDPs with ordinal rewards. Only a total order over rewards is assumed to be known. In this setting, we explain how an alternative way to define expressive and interpretable preferences using reference points can be exploited.
Type de document :
Communication dans un congrès
International Conference on Automated Planning and Scheduling, Jan 2011, Freiburg, Germany. International Conference on Automated Planning and Scheduling, 21, pp.282-289
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01285812
Contributeur : Lip6 Publications <>
Soumis le : mercredi 9 mars 2016 - 17:15:09
Dernière modification le : vendredi 31 août 2018 - 09:25:55

Identifiants

  • HAL Id : hal-01285812, version 1

Collections

Citation

Paul Weng. Markov Decision Processes with Ordinal Rewards: Reference Point-Based Preferences. International Conference on Automated Planning and Scheduling, Jan 2011, Freiburg, Germany. International Conference on Automated Planning and Scheduling, 21, pp.282-289. 〈hal-01285812〉

Partager

Métriques

Consultations de la notice

52