S. Athey and S. Wager, Efficient Policy Learning, 2017.

Y. Zhao, . Zeng, M. Rush, and . Kosorok, Estimating Individualized Treatment Rules Using Outcome Weighted Learning, Journal of the American Statistical Association, vol.18, issue.1, pp.1106-1118, 2012.
DOI : 10.1080/01621459.2012.695674

URL : http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3636816

B. Zhang, A. Tsiatis, . Davidian, E. Zhang, and . Laber, Estimating optimal treatment regimes from a classification perspective, Stat, vol.68, issue.1, pp.103-114, 2012.
DOI : 10.1002/sta4.124

URL : http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3640350

D. Rubin and M. Van-der-laan, Statistical Issues and Limitations in Personalized Medicine Research with Clinical Trials, The International Journal of Biostatistics, vol.8, issue.1, 2012.
DOI : 10.1515/1557-4679.1423

R. Alexander, . Luedtke, J. Mark, and . Van-der-laan, Super-learning of an optimal dynamic treatment rule, The International Journal of Biostatistics, vol.12, issue.1, pp.305-332, 2016.

A. Farahmand, Action-gap phenomenon in reinforcement learning, Advances in Neural Information Processing Systems, pp.172-180, 2011.

A. Luedtke and M. Van-der-laan, Statistical inference for the mean outcome under a possibly non-unique optimal treatment strategy. The Annals of Statistics, pp.713-742, 2016.

A. Chambaz, M. Zheng, and . Van-der-laan, Targeted sequential design for targeted learning inference of the optimal treatment rule and its mean reward, The Annals of Statistics, p.2017

A. Luedtke and M. Van-der-laan, Comment, Journal of the American Statistical Association, vol.10, issue.516, pp.1526-1530, 2016.
DOI : 10.1080/01621459.2012.695674

J. Audibert and A. Tsybakov, Fast learning rates for plug-in classifiers. The Annals of Statistics, pp.608-633, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00160849

V. Koltchinskii, Local Rademacher complexities and oracle inequalities in risk minimization . The Annals of Statistics, pp.2593-2656, 2006.
DOI : 10.1214/009053606000001019

URL : http://projecteuclid.org/download/pdfview_1/euclid.aos/1179935055

A. Sheehy and J. Wellner, Uniform Donsker classes of functions. The Annals of Probability, pp.1983-2030, 1992.
DOI : 10.1214/aop/1176989538

A. Van, J. Vaart, and . Wellner, Weak convergence and empirical processes, 1996.

A. Van and . Vaart, Asymptotic statistics, 1998.

M. Van-der-laan and J. Robins, Unified methods for censored longitudinal data and causality, 2003.
DOI : 10.1007/978-0-387-21700-0

M. Van-der-laan and D. Rubin, Targeted Maximum Likelihood Learning, The International Journal of Biostatistics, vol.2, issue.1, 2006.
DOI : 10.2202/1557-4679.1043

M. Van-der-laan and S. Rose, Targeted Learning: Causal Inference for Observational and Experimental Data, 2011.

A. Luedtke and M. Van-der-laan, Corrigendum to: Targeted Learning of the Mean Outcome under an Optimal Dynamic Treatment Rule, Journal of Causal Inference, vol.3, issue.2, pp.267-271, 2016.

M. Van-der-laan and A. Luedtke, Abstract, Journal of Causal Inference, vol.3, issue.1, pp.61-95
DOI : 10.1515/jci-2013-0022

V. Chernozhukov, . Chetverikov, . Demirer, C. Duflo, and . Hansen, Double machine learning for treatment and causal parameters, 2016.

W. Zheng and M. Van-der-laan, Targeted Maximum Likelihood Estimation of Natural Direct Effects, The International Journal of Biostatistics, vol.8, issue.1, pp.2012-1557
DOI : 10.2202/1557-4679.1361

P. Chaffee and M. Van-der-laan, Targeted minimum loss based estimation based on directly solving the efficient influence curve equation, 2011.

A. Tsybakov, Optimal aggregation of classifiers in statistical learning. The Annals of Statistics, pp.135-166, 2004.
URL : https://hal.archives-ouvertes.fr/hal-00102142

D. Rubin and M. Van-der-laan, A Doubly Robust Censoring Unbiased Transformation, The International Journal of Biostatistics, vol.3, issue.1, 2007.
DOI : 10.2202/1557-4679.1052

M. Van-der-laan and A. Luedtke, Targeted learning of an optimal dynamic treatment , and statistical inference for its mean outcome, 2014.

V. Koltchinskii, Oracle inequalities in empirical risk minimization and sparse recovery problems ISBN 978-3-642-22146-0. Lectures from the 38th Probability Summer School held in Saint-Flour, Lecture Notes in Mathematics, vol.2033, 2008.
DOI : 10.1007/978-3-642-22147-7

URL : http://link.springer.com/content/pdf/bfm%3A978-3-642-22147-7%2F1.pdf

K. Linn, E. Laber, and L. Stefanski, -Learning for Quantiles, Journal of the American Statistical Association, vol.25, pp.1-37, 2016.
DOI : 10.1080/01621459.2014.937488

URL : https://hal.archives-ouvertes.fr/in2p3-00400337

A. Chambaz, M. Zheng, and . Van-der-laan, Targeted sequential design for targeted learning inference of the optimal treatment rule and its mean reward, supplementary material, The Annals of Statistics, p.2017

A. Browder, Mathematical analysis: an introduction, 2012.
DOI : 10.1007/978-1-4612-0715-3