Scalable Collaborative Targeted Learning for High-Dimensional Data

Cheng J Ju; Susan Gruber; Samuel D Lendle; Antoine Chambaz; Jessica J Franklin; Richard Wyss; Sebastian Schneeweiss; Mark J van Der Laan

Pré-Publication, Document De Travail Année : 2017

Scalable Collaborative Targeted Learning for High-Dimensional Data

(1) , (2, 3) , (1) , (4, 5) , (6, 3) , (3, 6) , (3, 6) , (1)

1
2
3
4
5
6

Cheng J Ju

Fonction : Auteur

School of Public Health

Susan Gruber

Fonction : Auteur

Harvard Pilgrim Health Care Institute

Harvard Medical School [Boston]

Samuel D Lendle

Fonction : Auteur

School of Public Health

Antoine Chambaz

Fonction : Auteur
PersonId : 867345

Modélisation aléatoire de Paris X

Mathématiques Appliquées Paris 5

Jessica J Franklin

Fonction : Auteur

Brigham and Women's Hospital [Boston]

Harvard Medical School [Boston]

Richard Wyss

Fonction : Auteur

Harvard Medical School [Boston]

Brigham and Women's Hospital [Boston]

Sebastian Schneeweiss

Fonction : Auteur
PersonId : 781522
ORCID : 0000-0003-2575-467X

Harvard Medical School [Boston]

Brigham and Women's Hospital [Boston]

Mark J van Der Laan

Fonction : Auteur

School of Public Health

Résumé

Robust inference of a low-dimensional parameter in a large semi-parametric model relies on external estimators of infinite-dimensional features of the distribution of the data. Typically, only one of the latter is optimized for the sake of constructing a well behaved estimator of the low-dimensional parameter of interest. Optimizing more than one of them for the sake of achieving a better bias-variance trade-off in the estimation of the parameter of interest is the core idea driving the general template of the collaborative targeted minimum loss-based estimation (C-TMLE) procedure. The original implementation/instantiation of the C-TMLE template can be presented as a greedy forward stepwise C-TMLE algorithm. It does not scale well when the number p of covariates increases drastically. This motivates the introduction of a novel implementation/instantiation of the C-TMLE template where the covariates are pre-ordered. Its time complexity is O(p) as opposed to the original O(p 2), a remarkable gain. We propose two pre-ordering strategies and suggest a rule of thumb to develop other meaningful strategies. Because it is usually unclear a priori which pre-ordering strategy to choose, we also introduce another implementation/instantiation called SL-C-TMLE algorithm that enables the data-driven choice of the better pre-ordering strategy given the problem at hand. Its time complexity is O(p) as well. The computational burden and relative performance of these algorithms were compared in simulation studies involving fully synthetic data or partially synthetic data based on a real world large electronic health database, and in analyses of three real, large electronic health databases. In all analyses involving electronic health databases, the greedy C-TMLE algorithm is unacceptably slow. Simulation studies indicate our scalable C-TMLE and SL-C-TMLE algorithms work well. All C-TMLEs are publicly available in a Julia software package.

Mots clés

Observational Study Propensity Score Variable Selection Targeted Minimum Loss-based Estimation High dimensional Data Electronic Healthcare Database

Domaines

Statistiques [math.ST]

Fichier principal

C-TMLE_SMMR.pdf (434.1 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Antoine Chambaz : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01487569

Soumis le : dimanche 12 mars 2017-18:36:50

Dernière modification le : jeudi 11 avril 2024-13:16:13

Archivage à long terme le : mardi 13 juin 2017-12:22:46

Dates et versions

hal-01487569 , version 1 (12-03-2017)

Identifiants

HAL Id : hal-01487569 , version 1

Citer

Cheng J Ju, Susan Gruber, Samuel D Lendle, Antoine Chambaz, Jessica J Franklin, et al.. Scalable Collaborative Targeted Learning for High-Dimensional Data. 2017. ⟨hal-01487569⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS MAP5 USPC MODALX UNIV-PARIS-LUMIERES UP-SCIENCES ANR UNIV-PARIS-NANTERRE

259 Consultations

126 Téléchargements

Scalable Collaborative Targeted Learning for High-Dimensional Data

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Relations

Exporter

Collections

Partager