Exploiting the sign of the advantage function to learn deterministic policies in continuous domains

Matthieu Zimmer; Paul Weng

Communication Dans Un Congrès Année : 2019

Exploiting the sign of the advantage function to learn deterministic policies in continuous domains

(1) , (1)

Matthieu Zimmer

Fonction : Auteur
PersonId : 9288
IdHAL : matthieu-zimmer
ORCID : 0000-0002-8029-308X

Shanghai Jiao Tong University [Shanghai]

Paul Weng

Fonction : Auteur
PersonId : 952563

Shanghai Jiao Tong University [Shanghai]

Résumé

In the context of learning deterministic policies in continuous domains, we revisit an approach, which was first proposed in Continuous Actor Critic Learning Automaton (CACLA) and later extended in Neural Fitted Actor Critic (NFAC). This approach is based on a policy update different from that of deterministic policy gradient (DPG). Previous work has observed its excellent performance empirically, but a theoretical justification is lacking. To fill this gap, we provide a theoretical explanation to motivate this unorthodox policy update by relating it to another update and making explicit the objective function of the latter. We furthermore discuss in depth the properties of these updates to get a deeper understanding of the overall approach. In addition, we extend it and propose a new trust region algorithm, Penalized NFAC (PeNFAC). Finally, we experimentally demonstrate in several classic control problems that it surpasses the state-of-the-art algorithms to learn determinis-tic policies.

Mots clés

Reinforcement learning Actor Critic

Domaines

Intelligence artificielle [cs.AI]

Fichier principal

Exploiting_the_sign_of_the_advantage_function_to_learn_deterministic_policies_in_continuous_domains (4).pdf (5.43 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Matthieu Zimmer : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02145083

Soumis le : mardi 25 juin 2019-03:58:19

Dernière modification le : lundi 4 mai 2020-11:37:33

Dates et versions

hal-02145083 , version 1 (01-06-2019)

hal-02145083 , version 2 (25-06-2019)

Identifiants

HAL Id : hal-02145083 , version 2

Citer

Matthieu Zimmer, Paul Weng. Exploiting the sign of the advantage function to learn deterministic policies in continuous domains. International Joint Conferences on Artificial Intelligence, Aug 2019, Macao, China. ⟨hal-02145083v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

GRID5000 SILECS

63 Consultations

269 Téléchargements

Exploiting the sign of the advantage function to learn deterministic policies in continuous domains

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager