Collaborative targeted inference from continuously indexed nuisance parameter estimators

Abstract : Suppose that we wish to infer the value of a statistical parameter at a law from which we sample independent observations. Suppose that this parameter is smooth and that we can define two variation-independent, infinite-dimensional features of the law, its so called Q-and G-components (comp.), such that if we estimate them consistently at a fast enough product of rates, then we can build a confidence interval (CI) with a given asymptotic level based on a plain targeted minimum loss estimator (TMLE). The estimators of the Q-and G-comp. would typically be by products of machine learning algorithms. We focus on the case that the machine learning algorithm for the G-comp. is fine-tuned by a real-valued parameter h. Then, a plain TMLE with an h chosen by cross-validation would typically not lend itself to the construction of a CI, because the selection of h would trade-off its empirical bias with something akin to the empirical variance of the estimator of the G-comp. as opposed to that of the TMLE. A collaborative TMLE (C-TMLE) might, however, succeed in achieving the relevant trade-off. We prove that this is the case indeed. We construct a C-TMLE and show that, under high-level empirical processes conditions, and if there exists an oracle h that makes a bulky remainder term asymptotically Gaussian, then the C-TMLE is asymptotically Gaussian hence amenable to building a CI provided that its asymptotic variance can be estimated too. The construction hinges on guaranteeing that an additional, well chosen estimating equation is solved on top of the estimating equation that a plain TMLE solves. The optimal h is chosen by cross-validating an empirical criterion that guarantees the wished trade-off between empirical bias and variance. We illustrate the construction and main result with the inference of the so called average treatment effect, where the Q-comp. consists in a marginal law and a conditional expectation, and the G-comp. is a propensity score (a conditional probability). We also conduct a multi-faceted simulation study to investigate the empirical properties of the collaborative TMLE when the G-comp. is estimated by the LASSO. Here, h is the bound on the 1-norm of the candidate coefficients. The variety of scenarios shed light on small and moderate sample properties, in the face of low-, moderate-or high-dimensional baseline covariates, and possibly positivity violation.
Document type :
Preprints, Working Papers, ...
Complete list of metadatas

Cited literature [34 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01759020
Contributor : Antoine Chambaz <>
Submitted on : Thursday, April 5, 2018 - 9:32:11 AM
Last modification on : Friday, September 20, 2019 - 4:34:03 PM

File

1804.00102.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01759020, version 1

Collections

Citation

Cheng Ju, Antoine Chambaz, Mark van der Laan. Collaborative targeted inference from continuously indexed nuisance parameter estimators. 2018. ⟨hal-01759020⟩

Share

Metrics

Record views

151

Files downloads

132