Statistical Optimality of Stochastic Gradient Descent on Hard Learning Problems through Multiple Passes

Loucas Pillaud-Vivien; Alessandro Rudi; Francis Bach

Pré-Publication, Document De Travail Année : 2018

Statistical Optimality of Stochastic Gradient Descent on Hard Learning Problems through Multiple Passes

(1, 2) , (1, 2) , (1, 2)

1
2

Loucas Pillaud-Vivien

Fonction : Auteur

Statistical Machine Learning and Parsimony

Université Paris Sciences et Lettres

Alessandro Rudi

Fonction : Auteur
PersonId : 21784
IdHAL : alessandro-rudi
ORCID : 0000-0002-3879-7794
IdRef : 240218043

Statistical Machine Learning and Parsimony

Université Paris Sciences et Lettres

Francis Bach

Fonction : Auteur
PersonId : 863086

Statistical Machine Learning and Parsimony

Université Paris Sciences et Lettres

Résumé

We consider stochastic gradient descent (SGD) for least-squares regression with potentially several passes over the data. While several passes have been widely reported to perform practically better in terms of predictive performance on unseen data, the existing theoretical analysis of SGD suggests that a single pass is statistically optimal. While this is true for low-dimensional easy problems, we show that for hard problems, multiple passes lead to statistically optimal predictions while single pass does not; we also show that in these hard models, the optimal number of passes over the data increases with sample size. In order to define the notion of hardness and show that our predictive performances are optimal, we consider potentially infinite-dimensional models and notions typically associated to kernel methods, namely, the decay of eigenvalues of the covariance matrix of the features and the complexity of the optimal predictor as measured through the covariance matrix. We illustrate our results on synthetic experiments with non-linear kernel methods and on a classical benchmark with a linear model.

Mots clés

Stochastic gradient descent

Domaines

Apprentissage [cs.LG] Optimisation et contrôle [math.OC] Statistiques [math.ST] Machine Learning [stat.ML]

Fichier principal

multipass_sgd.pdf (1.52 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Loucas Pillaud-Vivien : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01799116

Soumis le : jeudi 28 juin 2018-15:24:05

Dernière modification le : vendredi 19 avril 2024-16:18:58

Archivage à long terme le : jeudi 27 septembre 2018-06:25:45

Dates et versions

hal-01799116 , version 1 (24-05-2018)

hal-01799116 , version 2 (28-06-2018)

hal-01799116 , version 3 (22-11-2018)

hal-01799116 , version 4 (10-01-2019)

Identifiants

HAL Id : hal-01799116 , version 2
ARXIV : 1805.10074

Citer

Loucas Pillaud-Vivien, Alessandro Rudi, Francis Bach. Statistical Optimality of Stochastic Gradient Descent on Hard Learning Problems through Multiple Passes. 2018. ⟨hal-01799116v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

288 Consultations

914 Téléchargements

Statistical Optimality of Stochastic Gradient Descent on Hard Learning Problems through Multiple Passes

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Altmetric

Partager