Multi-Objective MDPs with Conditional Lexicographic Reward Preferences - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2015

Multi-Objective MDPs with Conditional Lexicographic Reward Preferences

Résumé

Sequential decision problems that involve multiple objectives are prevalent. Consider for example a driver of a semi-autonomous car who may want to optimize competing objectives such as travel time and the effort associated with manual driving. We introduce a rich model called Lexico-graphic MDP (LMDP) and a corresponding planning algorithm called LVI that generalize previous work by allowing for conditional lexicographic preferences with slack. We analyze the convergence characteristics of LVI and establish its game theoretic properties. The performance of LVI in practice is tested within a realistic benchmark problem in the domain of semi-autonomous driving. Finally, we demonstrate how GPU-based optimization can improve the scalability of LVI and other value iteration algorithms for MDPs.
Fichier principal
Vignette du fichier
hal-01191876.pdf (6.33 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-01191876 , version 1 (08-09-2015)

Identifiants

  • HAL Id : hal-01191876 , version 1

Citer

Kyle Hollins, Shlomo Zilberstein, Abdel-Illah Mouaddib. Multi-Objective MDPs with Conditional Lexicographic Reward Preferences. Twenty-Ninth Conference on Artificial Intelligence (AAAI), Jan 2015, Austin, United States. pp.3418-3424. ⟨hal-01191876⟩
150 Consultations
39 Téléchargements

Partager

Gmail Facebook X LinkedIn More