Batch Policy Iteration Algorithms for Continuous Domains

Abstract : This paper establishes the link between an adaptation of the policy iteration method for Markov decision processes with continuous state and action spaces and the policy gradient method when the differentiation of the mean value is directly done over the policy without parameterization. This approach allows deriving sound and practical batch Reinforcement Learning algorithms for continuous state and action spaces.
Document type :
Conference papers
Complete list of metadatas

Cited literature [16 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01629651
Contributor : Matthieu Geist <>
Submitted on : Monday, November 6, 2017 - 4:09:09 PM
Last modification on : Thursday, April 5, 2018 - 12:30:25 PM

File

ewrl13-2016-submission_3.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01629651, version 1

Citation

Bilal Piot, Matthieu Geist, Olivier Pietquin. Batch Policy Iteration Algorithms for Continuous Domains. European Workshop on Reinforcement Learning (EWRL), 2016, Barcelone, Spain. ⟨hal-01629651⟩

Share

Metrics

Record views

57

Files downloads

37