Minimax rate of consistency for linear models with missing values

Alexis Ayme; Claire Boyer; Aymeric Dieuleveut; Erwan Scornet

Pré-Publication, Document De Travail Année : 2022

Minimax rate of consistency for linear models with missing values

(1) , (1, 2) , (3) , (3)

1
2
3

Alexis Ayme

Fonction : Auteur
PersonId : 1124994

Laboratoire de Probabilités, Statistique et Modélisation

Claire Boyer

Fonction : Auteur

Laboratoire de Probabilités, Statistique et Modélisation

Méthodes numériques pour le problème de Monge-Kantorovich et Applications en sciences sociales

Aymeric Dieuleveut

Fonction : Auteur
PersonId : 1109167
IdHAL : aymeric-dieuleveut
ORCID : 0009-0005-1848-1724

Centre de Mathématiques Appliquées - Ecole Polytechnique

Erwan Scornet

Fonction : Auteur

Centre de Mathématiques Appliquées - Ecole Polytechnique

Résumé

Missing values arise in most real-world data sets due to the aggregation of multiple sources and intrinsically missing information (sensor failure, unanswered questions in surveys...). In fact, the very nature of missing values usually prevents us from running standard learning algorithms. In this paper, we focus on the extensively-studied linear models, but in presence of missing values, which turns out to be quite a challenging task. Indeed, the Bayes rule can be decomposed as a sum of predictors corresponding to each missing pattern. This eventually requires to solve a number of learning tasks, exponential in the number of input features, which makes predictions impossible for current real-world datasets. First, we propose a rigorous setting to analyze a least-square type estimator and establish a bound on the excess risk which increases exponentially in the dimension. Consequently, we leverage the missing data distribution to propose a new algorithm, and derive associated adaptive risk bounds that turn out to be minimax optimal. Numerical experiments highlight the benefits of our method compared to state-of-the-art algorithms used for predictions with missing values.

Domaines

Machine Learning [stat.ML]

Fichier principal

main.pdf (735.57 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Alexis Ayme : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03552109

Soumis le : mercredi 2 février 2022-14:22:04

Dernière modification le : vendredi 19 avril 2024-16:18:58

Dates et versions

hal-03552109 , version 1 (02-02-2022)

hal-03552109 , version 2 (06-12-2022)

Identifiants

HAL Id : hal-03552109 , version 1
ARXIV : 2202.01463

Citer

Alexis Ayme, Claire Boyer, Aymeric Dieuleveut, Erwan Scornet. Minimax rate of consistency for linear models with missing values. 2022. ⟨hal-03552109v1⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

172 Consultations

132 Téléchargements

Minimax rate of consistency for linear models with missing values

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Altmetric

Partager