Zap Q-Learning - A User's Guide - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2019

Zap Q-Learning - A User's Guide

Résumé

There are two well known Stochastic Approximation techniques that are known to have optimal rate of convergence (measured in terms of asymptotic variance): the Stochastic Newton-Raphson (SNR) algorithm (a matrix gain algorithm that resembles the deterministic Newton-Raphson method), and the Ruppert-Polyak averaging technique. This paper surveys new applications of these concepts for Q-learning: (i)The Zap Q-Learning algorithm was introduced by the authors in a NIPS 2017 paper. It is based on a variant of SNR, designed to more closely mimic its deterministic cousin. The algorithm has optimal rate of convergence under general assumptions, and showed astonishingly quick convergence in numerical examples. These algorithms are surveyed and illustrated with numerical examples. A potential difficulty in implementation of the Zap-Q-Learning algorithm is the matrix inversion required in each iteration. (ii)Remedies are proposed based on stochastic approximation variants of two general deterministic techniques: Polyak's momentum algorithms and Nesterov's acceleration technique. Provided the hyper-parameters are chosen with care, the performance of these algorithms can be comparable to the Zap algorithm, while computational complexity per iteration is far lower.
Fichier non déposé

Dates et versions

hal-02429733 , version 1 (06-01-2020)

Identifiants

Citer

Adithya Devraj, Ana Bušić, Sean Meyn. Zap Q-Learning - A User's Guide. ICC 2019 - Fifth Indian Control Conference, Jan 2019, New Delhi, India. pp.10-15, ⟨10.1109/INDIANCC.2019.8715554⟩. ⟨hal-02429733⟩
115 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More