Exploiting the sign of the advantage function to learn deterministic policies in continuous domains - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2019

Exploiting the sign of the advantage function to learn deterministic policies in continuous domains

Paul Weng
  • Fonction : Auteur
  • PersonId : 952563

Résumé

In the context of learning deterministic policies in continuous domains, we revisit an approach, which was first proposed in Continuous Actor Critic Learning Automaton (CACLA) and later extended in Neural Fitted Actor Critic (NFAC). This approach is based on a policy update different from that of deterministic policy gradient (DPG). Previous work has observed its excellent performance empirically, but a theoretical justification is lacking. To fill this gap, we provide a theoretical explanation to motivate this unorthodox policy update by relating it to another update and making explicit the objective function of the latter. We furthermore discuss in depth the properties of these updates to get a deeper understanding of the overall approach. In addition, we extend it and propose a new trust region algorithm, Penalized NFAC (PeNFAC). Finally, we experimentally demonstrate in several classic control problems that it surpasses the state-of-the-art algorithms to learn determinis-tic policies.
Fichier principal
Vignette du fichier
Exploiting_the_sign_of_the_advantage_function_to_learn_deterministic_policies_in_continuous_domains (4).pdf (5.43 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02145083 , version 1 (01-06-2019)
hal-02145083 , version 2 (25-06-2019)

Identifiants

  • HAL Id : hal-02145083 , version 2

Citer

Matthieu Zimmer, Paul Weng. Exploiting the sign of the advantage function to learn deterministic policies in continuous domains. International Joint Conferences on Artificial Intelligence, Aug 2019, Macao, China. ⟨hal-02145083v2⟩

Collections

GRID5000 SILECS
63 Consultations
269 Téléchargements

Partager

Gmail Facebook X LinkedIn More