Skip to Main content Skip to Navigation
Conference papers

Exploiting the sign of the advantage function to learn deterministic policies in continuous domains

Abstract : In the context of learning deterministic policies in continuous domains, we revisit an approach, which was first proposed in Continuous Actor Critic Learning Automaton (CACLA) and later extended in Neural Fitted Actor Critic (NFAC). This approach is based on a policy update different from that of deterministic policy gradient (DPG). Previous work has observed its excellent performance empirically, but a theoretical justification is lacking. To fill this gap, we provide a theoretical explanation to motivate this unorthodox policy update by relating it to another update and making explicit the objective function of the latter. We furthermore discuss in depth the properties of these updates to get a deeper understanding of the overall approach. In addition, we extend it and propose a new trust region algorithm, Penalized NFAC (PeNFAC). Finally, we experimentally demonstrate in several classic control problems that it surpasses the state-of-the-art algorithms to learn determinis-tic policies.
Complete list of metadatas

Cited literature [6 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-02145083
Contributor : Matthieu Zimmer <>
Submitted on : Tuesday, June 25, 2019 - 3:58:19 AM
Last modification on : Wednesday, July 3, 2019 - 1:22:03 AM

File

Exploiting_the_sign_of_the_adv...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02145083, version 2

Collections

Citation

Matthieu Zimmer, Paul Weng. Exploiting the sign of the advantage function to learn deterministic policies in continuous domains. International Joint Conferences on Artificial Intelligence, Aug 2019, Macao, China. ⟨hal-02145083v2⟩

Share

Metrics

Record views

33

Files downloads

165