On Matrix Momentum Stochastic Approximation and Applications to Q-learning

Adithya M. Devraj; Ana Bušić; Sean Meyn

doi:10.1109/ALLERTON.2019.8919828

Communication Dans Un Congrès Année : 2019

On Matrix Momentum Stochastic Approximation and Applications to Q-learning

(1) , (2, 3) , (1)

1
2
3

Adithya M. Devraj

Fonction : Auteur

Department of Electrical and Computer Engineering [Gainesville]

Ana Bušić

Fonction : Auteur
PersonId : 2602
IdHAL : anabusic
ORCID : 0000-0002-4133-3739
IdRef : 144488175

Dynamics of Geometric Networks

Laboratory of Information, Network and Communication Sciences

Sean Meyn

Fonction : Auteur

Department of Electrical and Computer Engineering [Gainesville]

Résumé

Stochastic approximation (SA) algorithms are recursive techniques used to obtain the roots of functions that can be expressed as expectations of a noisy parameterized family of functions. In this paper two new SA algorithms are introduced: 1) PolSA, an extension of Polyak’s momentum technique with a specially designed matrix momentum, and 2) NeSA, which can either be regarded as a variant of Nesterov’s acceleration method, or a simplification of PolSA. The rates of convergence of SA algorithms is well understood. Under special conditions, the mean square error of the parameter estimates is bounded by $\sigma^{2}/n+o(1/n)$, where $\sigma^{2} \geq 0$ is an identifiable constant. If these conditions fail, the rate is typically sub-linear. There are two well known SA algorithms that ensure a linear rate, with minimal value of variance, $\sigma^{2}$: the Ruppert-Polyak averaging technique, and the stochastic Newton-Raphson (SNR) algorithm. It is demonstrated here that under mild technical assumptions, the PolSA algorithm also achieves this optimality criteria. This result is established via novel coupling arguments: It is shown that the parameter estimates obtained from the PolSA algorithm couple with those of the optimal variance (but computationally more expensive) SNR algorithm, at a rate $O(1/n^{2})$. The newly proposed algorithms are extended to a reinforcement learning setting to obtain new Q-learning algorithms, and numerical results confirm the coupling of PolSA and SNR.

Domaines

Optimisation et contrôle [math.OC] Apprentissage [cs.LG] Probabilités [math.PR] Intelligence artificielle [cs.AI]

Ana Busic : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01968558

Soumis le : mercredi 2 janvier 2019-19:34:23

Dernière modification le : vendredi 19 avril 2024-16:18:58

Dates et versions

hal-01968558 , version 1 (02-01-2019)

Identifiants

HAL Id : hal-01968558 , version 1
ARXIV : 1809.06277
DOI : 10.1109/ALLERTON.2019.8919828

Citer

Adithya M. Devraj, Ana Bušić, Sean Meyn. On Matrix Momentum Stochastic Approximation and Applications to Q-learning. 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Sep 2019, Monticello, IL, United States. ⟨10.1109/ALLERTON.2019.8919828⟩. ⟨hal-01968558⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM ENS-PARIS CNRS INRIA INRIA2 TDS-MACS PSL SORBONNE-UNIVERSITE SU-SCIENCES ANR

127 Consultations

0 Téléchargements

On Matrix Momentum Stochastic Approximation and Applications to Q-learning

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager