Practical targeted learning from large data sets by survey sampling

Abstract : We address the practical construction of asymptotic confidence intervals for smooth (i.e., path-wise differentiable), real-valued statistical parameters by targeted learning from independent and identically distributed data in contexts where sample size is so large that it poses computational challenges. We observe some summary measure of all data and select a sub-sample from the complete data set by Poisson rejective sampling with unequal inclusion probabilities based on the summary measures. Targeted learning is carried out from the easier to handle sub-sample. We derive a central limit theorem for the targeted minimum loss estimator (TMLE) which enables the construction of the confidence intervals. The inclusion probabilities can be optimized to reduce the asymptotic variance of the TMLE. We illustrate the procedure with two examples where the parameters of interest are variable importance measures of an exposure (binary or continuous) on an outcome. We also conduct a simulation study and comment on its results. keywords: semiparametric inference; survey sampling; targeted minimum loss estimation (TMLE)
Type de document :
Pré-publication, Document de travail
2016
Liste complète des métadonnées

Littérature citée [26 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01339538
Contributeur : Emilien Joly <>
Soumis le : mercredi 29 juin 2016 - 20:16:09
Dernière modification le : jeudi 30 novembre 2017 - 01:16:38

Fichiers

TMLE_Sondage.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01339538, version 1
  • ARXIV : 1606.09522

Citation

Patrice Bertail, Antoine Chambaz, Emilien Joly. Practical targeted learning from large data sets by survey sampling. 2016. 〈hal-01339538〉

Partager

Métriques

Consultations de la notice

219

Téléchargements de fichiers

57