Zap Q-Learning - A User's Guide

Adithya Devraj; Ana Bušić; Sean Meyn

doi:10.1109/INDIANCC.2019.8715554

Communication Dans Un Congrès Année : 2019

Zap Q-Learning - A User's Guide

(1) , (2, 3) , (1)

1
2
3

Adithya Devraj

Fonction : Auteur

Department of Electrical and Computer Engineering [Gainesville]

Ana Bušić

Fonction : Auteur
PersonId : 2602
IdHAL : anabusic
ORCID : 0000-0002-4133-3739
IdRef : 144488175

Dynamics of Geometric Networks

Laboratory of Information, Network and Communication Sciences

Sean Meyn

Fonction : Auteur

Department of Electrical and Computer Engineering [Gainesville]

Résumé

There are two well known Stochastic Approximation techniques that are known to have optimal rate of convergence (measured in terms of asymptotic variance): the Stochastic Newton-Raphson (SNR) algorithm (a matrix gain algorithm that resembles the deterministic Newton-Raphson method), and the Ruppert-Polyak averaging technique. This paper surveys new applications of these concepts for Q-learning: (i)The Zap Q-Learning algorithm was introduced by the authors in a NIPS 2017 paper. It is based on a variant of SNR, designed to more closely mimic its deterministic cousin. The algorithm has optimal rate of convergence under general assumptions, and showed astonishingly quick convergence in numerical examples. These algorithms are surveyed and illustrated with numerical examples. A potential difficulty in implementation of the Zap-Q-Learning algorithm is the matrix inversion required in each iteration. (ii)Remedies are proposed based on stochastic approximation variants of two general deterministic techniques: Polyak's momentum algorithms and Nesterov's acceleration technique. Provided the hyper-parameters are chosen with care, the performance of these algorithms can be comparable to the Zap algorithm, while computational complexity per iteration is far lower.

Mots clés

Convergence Acceleration Markov processes Covariance matrices Newton method Approximation algorithms

Domaines

Optimisation et contrôle [math.OC] Probabilités [math.PR]

Ana Busic : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02429733

Soumis le : lundi 6 janvier 2020-18:31:52

Dernière modification le : vendredi 19 avril 2024-16:18:58

Dates et versions

hal-02429733 , version 1 (06-01-2020)

Identifiants

HAL Id : hal-02429733 , version 1
DOI : 10.1109/INDIANCC.2019.8715554

Citer

Adithya Devraj, Ana Bušić, Sean Meyn. Zap Q-Learning - A User's Guide. ICC 2019 - Fifth Indian Control Conference, Jan 2019, New Delhi, India. pp.10-15, ⟨10.1109/INDIANCC.2019.8715554⟩. ⟨hal-02429733⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM ENS-PARIS CNRS INRIA INRIA2 TDS-MACS PSL SORBONNE-UNIVERSITE SU-SCIENCES ANR

115 Consultations

0 Téléchargements

Zap Q-Learning - A User's Guide

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager