Action-Constrained Markov Decision Processes With Kullback-Leibler Cost - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2018

Action-Constrained Markov Decision Processes With Kullback-Leibler Cost

Résumé

This paper concerns computation of optimal policies in which the one-step reward function contains a cost term that models Kullback-Leibler divergence with respect to nominal dynamics. This technique was introduced by Todorov in 2007, where it was shown under general conditions that the solution to the average-reward optimality equations reduce to a simple eigenvector problem. Since then many authors have sought to apply this technique to control problems and models of bounded rationality in economics. A crucial assumption is that the input process is essentially unconstrained. For example, if the nominal dynamics include randomness from nature (e.g., the impact of wind on a moving vehicle), then the optimal control solution does not respect the exogenous nature of this disturbance. This paper introduces a technique to solve a more general class of action-constrained MDPs. The main idea is to solve an entire parameterized family of MDPs, in which the parameter is a scalar weighting the one-step reward function. The approach is new and practical even in the original unconstrained formulation.

Dates et versions

hal-01968536 , version 1 (02-01-2019)

Identifiants

Citer

Ana Bušić, Sean Meyn. Action-Constrained Markov Decision Processes With Kullback-Leibler Cost. Conference On Learning Theory (COLT), Jul 2018, Stockholm, Sweden. ⟨hal-01968536⟩
59 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More