Natural Langevin Dynamics for Neural Networks

Yann Ollivier; Gaétan Marceau-Caron

doi:10.1007/978-3-319-68445-1_53

Communication Dans Un Congrès Année : 2017

Natural Langevin Dynamics for Neural Networks

(1, 2, 3) , (4)

1
2
3
4

Yann Ollivier

Fonction : Auteur
PersonId : 883809

TAckling the Underspecified

Laboratoire de Recherche en Informatique

Centre National de la Recherche Scientifique

Gaétan Marceau-Caron

Fonction : Auteur

Montreal Institute for Learning Algorithms [Montréal]

Résumé

One way to avoid overfitting in machine learning is to use model parameters distributed according to a Bayesian posterior given the data, rather than the maximum likelihood estimator. Stochastic gradient Langevin dynamics (SGLD) is one algorithm to approximate such Bayesian posteriors for large models and datasets. SGLD is a standard stochastic gradient descent to which is added a controlled amount of noise, specifically scaled so that the parameter converges in law to the posterior distribution [WT11, TTV16]. The posterior predictive distribution can be approximated by an ensemble of samples from the trajectory. Choice of the variance of the noise is known to impact the practical behavior of SGLD: for instance, noise should be smaller for sensitive parameter directions. Theoretically, it has been suggested to use the inverse Fisher information matrix of the model as the variance of the noise, since it is also the variance of the Bayesian posterior [PT13, AKW12, GC11]. But the Fisher matrix is costly to compute for large- dimensional models. Here we use the easily computed Fisher matrix approximations for deep neural networks from [MO16, Oll15]. The resulting natural Langevin dynamics combines the advantages of Amari's natural gradient descent and Fisher-preconditioned Langevin dynamics for large neural networks. Small-scale experiments on MNIST show that Fisher matrix preconditioning brings SGLD close to dropout as a regularizing technique.

Domaines

Apprentissage [cs.LG] Réseau de neurones [cs.NE]

Yann Ollivier : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01655949

Soumis le : mardi 5 décembre 2017-12:12:49

Dernière modification le : lundi 12 février 2024-09:44:03

Dates et versions

hal-01655949 , version 1 (05-12-2017)

Identifiants

HAL Id : hal-01655949 , version 1
ARXIV : 1712.01076
DOI : 10.1007/978-3-319-68445-1_53

Citer

Yann Ollivier, Gaétan Marceau-Caron. Natural Langevin Dynamics for Neural Networks. GSI 2017 - 3rd conference on Geometric Science of Information, Nov 2017, Paris, France. pp.451-459, ⟨10.1007/978-3-319-68445-1_53⟩. ⟨hal-01655949⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UMR8623 CENTRALESUPELEC INRIA2 LRI-AO UNIV-PARIS-SACLAY LISN GS-ENGINEERING GS-COMPUTER-SCIENCE LISN-AO

227 Consultations

0 Téléchargements

Natural Langevin Dynamics for Neural Networks

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager