Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model

Raphaël Berthier; Francis Bach; Pierre Gaillard

Communication Dans Un Congrès Année : 2020

Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model

(1, 2) , (2, 1) , (2, 1, 3)

1
2
3

Raphaël Berthier

Fonction : Auteur
PersonId : 1032278

Université Paris Sciences et Lettres

Statistical Machine Learning and Parsimony

Francis Bach

Fonction : Auteur
PersonId : 863086

Statistical Machine Learning and Parsimony

Université Paris Sciences et Lettres

Pierre Gaillard

Fonction : Auteur
PersonId : 13025
IdHAL : pierre-gaillard
ORCID : 0000-0002-5665-7904
IdRef : 19041992X

Statistical Machine Learning and Parsimony

Université Paris Sciences et Lettres

Apprentissage de modèles à partir de données massives

Résumé

In the context of statistical supervised learning, the noiseless linear model assumes that there exists a deterministic linear relation $Y = \langle \theta_*, X \rangle$ between the random output $Y$ and the random feature vector $\Phi(U)$, a potentially non-linear transformation of the inputs $U$. We analyze the convergence of single-pass, fixed step-size stochastic gradient descent on the least-square risk under this model. The convergence of the iterates to the optimum $\theta_*$ and the decay of the generalization error follow polynomial convergence rates with exponents that both depend on the regularities of the optimum $\theta_*$ and of the feature vectors $\Phi(u)$. We interpret our result in the reproducing kernel Hilbert space framework. As a special case, we analyze an online algorithm for estimating a real function on the unit interval from the noiseless observation of its value at randomly sampled points; the convergence depends on the Sobolev smoothness of the function and of a chosen kernel. Finally, we apply our analysis beyond the supervised learning setting to obtain convergence rates for the averaging process (a.k.a. gossip algorithm) on a graph depending on its spectral dimension.

Mots clés

Averaging process Kernel regression Gossip algorithm Stochastic gradient descent Nonparametric rates

Domaines

Optimisation et contrôle [math.OC] Machine Learning [stat.ML] Système multi-agents [cs.MA] Apprentissage [cs.LG]

Fichier principal

neurips_2020.pdf (701.33 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Raphaël Berthier : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02866755

Soumis le : lundi 26 octobre 2020-17:20:05

Dernière modification le : vendredi 19 avril 2024-16:18:58

Dates et versions

hal-02866755 , version 1 (15-06-2020)

hal-02866755 , version 2 (26-10-2020)

Identifiants

HAL Id : hal-02866755 , version 2
ARXIV : 2006.08212

Citer

Raphaël Berthier, Francis Bach, Pierre Gaillard. Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model. NeurIPS '20 - 34th International Conference on Neural Information Processing Systems, Dec 2020, Vancouver, Canada. pp.2576--2586. ⟨hal-02866755v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS UGA CNRS INRIA LJK LJK_GI INRIA2 TDS-MACS LJK-GI-THOTH PSL ANR PRAIRIE-IA

5146 Consultations

180 Téléchargements

Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager