X. Take, ) such that X 1 and X 2 are independently generated with gaussian distribution N (0, 1) The associated label is designed as follows: If X 1 > 0 and X 2 > 0 then Y = 1 with probability q

M. A. A¨?zermana¨?zerman, E. M. Braverman, and L. , Method of Potential Functions in the Theory of Learning Machines, 1970.

S. Arlot and P. Bartlett, Margin-adaptive model selection in statistical learning, Bernoulli, vol.17, issue.2, 2008.
DOI : 10.3150/10-BEJ288

URL : https://hal.archives-ouvertes.fr/hal-00274327

S. Arlot and P. Massart, Data-driven calibration of penalties for least-squares regression, Journal of Machine Learning Research, vol.10, pp.245-279, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00287631

G. Blanchard, C. Schafer, Y. Rozenholc, and K. Muller, Optimal dyadic decision trees, Machine Learning, vol.52, issue.4, pp.2-3, 2007.
DOI : 10.1007/s10994-007-0717-6

URL : https://hal.archives-ouvertes.fr/hal-00264988

S. Boucheron, O. Bousquet, and G. Lugosi, Theory of Classification: a Survey of Some Recent Advances, ESAIM: Probability and Statistics, vol.9, pp.323-375, 2005.
DOI : 10.1051/ps:2005018

URL : https://hal.archives-ouvertes.fr/hal-00017923

L. Breiman, Arcing classifiers, Ann. Statist, vol.26, issue.3, pp.801-849, 1998.

L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification And Regression Trees, 1984.

P. A. Chou, T. Lookabaugh, and R. M. Gray, Optimal pruning with applications to tree-structured source coding and modeling, IEEE Transactions on Information Theory, vol.35, issue.2, pp.299-315, 1989.
DOI : 10.1109/18.32124

L. Devroye, L. Györfi, and G. Lugosi, A probabilistic theory of pattern recognition
DOI : 10.1007/978-1-4612-0711-5

Y. Freund and R. E. Schapire, A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting, Journal of Computer and System Sciences, vol.55, issue.1, pp.119-139, 1997.
DOI : 10.1006/jcss.1997.1504

S. B. Gelfand, C. Ravishankar, and E. J. Delp, An iterative growing and pruning algorithm for classification tree design, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.13, issue.2, pp.163-174, 1991.
DOI : 10.1109/34.67645

S. Gey and E. Lebarbier, Using cart to detect multiple changepoints in the mean for large samples, 2008.

S. Gey, M. Huard, and T. , Risk Bounds for Embedded Variable Selection in Classification Trees, IEEE Transactions on Information Theory, vol.60, issue.3, 2011.
DOI : 10.1109/TIT.2014.2298874

URL : https://hal.archives-ouvertes.fr/hal-00613041

S. Gey and E. Nedelec, Model Selection for CART Regression Trees, IEEE Transactions on Information Theory, vol.51, issue.2, pp.658-670, 2005.
DOI : 10.1109/TIT.2004.840903

URL : https://hal.archives-ouvertes.fr/hal-00326549

T. Hastie, R. Tibshirani, and J. Friedman, The elements of statistical learning, 2001.

M. Kohler, K. Zak, and A. , On the Rate of Convergence of Local Averaging Plug-In Classification Rules Under a Margin Condition, IEEE Transactions on Information Theory, vol.53, issue.5, pp.1735-1742, 2007.
DOI : 10.1109/TIT.2007.894625

V. Koltchinskii, Local Rademacher complexities and oracle inequalities in risk minimization, The Annals of Statistics, vol.34, issue.6, pp.2593-2656, 2006.
DOI : 10.1214/009053606000001019

V. Koltchinskii, . Rejoinderann, and . Statist, Local Rademacher complexities and oracle inequalities in risk minimization, The Annals of Statistics, vol.34, issue.6, pp.2593-2656, 2006.
DOI : 10.1214/009053606000001019

G. Lecué, Simultaneous adaptation to the margin and to complexity in classification, The Annals of Statistics, vol.35, issue.4, pp.1698-1721, 2007.
DOI : 10.1214/009053607000000055

G. Lugosi, Pattern classification and learning theory of CISM Courses and Lectures, Principles of nonparametric learning, pp.1-56, 2001.

E. Mammen and A. B. Tsybakov, Smooth discrimination analysis, Ann. Statist, vol.27, issue.6, pp.1808-1829, 1999.

T. Mary-huard, Reduction de la Dimension et Selection de Modeles en Classification Supervisee, 2006.

P. Massart, Some applications of concentration inequalities to statistics, Annales de la facult?? des sciences de Toulouse Math??matiques, vol.9, issue.2, 2000.
DOI : 10.5802/afst.961

P. Massart, Concentration inequalities and model selection Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour, Lecture Notes in Mathematics, vol.1896, 2003.

A. B. Nobel, Recursive partitioning to reduce distortion, IEEE Transactions on Information Theory, vol.43, issue.4, pp.1122-1133, 1997.
DOI : 10.1109/18.605573

A. B. Nobel, Analysis of a complexity-based pruning scheme for classification trees, IEEE Transactions on Information Theory, vol.48, issue.8, pp.2362-2368, 2002.
DOI : 10.1109/TIT.2002.800482

A. B. Nobel and R. A. Olshen, Termination and continuity of greedy growing for tree-structured vector quantizers, IEEE Transactions on Information Theory, vol.42, issue.1, pp.191-205, 1996.
DOI : 10.1109/18.481789

E. Rio, Une in??galit?? de Bennett pour les maxima de processus empiriquesA Bennet type inequality for maxima of empirical processes, Annales de l'Institut Henri Poincare (B) Probability and Statistics, vol.38, issue.6, pp.1053-1057, 2002.
DOI : 10.1016/S0246-0203(02)01122-6

M. Sauvé and C. Tuleau, Variable selection through CART, Institut National de Recherche en Informatique et en Automatique, 2006.
DOI : 10.1051/ps/2014006

R. E. Schapire, Y. Freund, P. Bartlett, S. Lee, and W. , Boosting the margin: a new explanation for the effectiveness of voting methods, The Annals of Statistics, vol.26, issue.5, pp.1651-1686, 1998.
DOI : 10.1214/aos/1024691352

C. Scott, Tree pruning with subadditive penalties, IEEE Transactions on Signal Processing, vol.53, issue.12, pp.4518-4525, 2005.
DOI : 10.1109/TSP.2005.859220

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.124.3499

C. Scott and R. Nowak, Minimax-optimal classification with dyadic decision trees, IEEE Transactions on Information Theory, vol.52, issue.4, pp.1335-1353, 2006.
DOI : 10.1109/TIT.2006.871056

A. B. Tsybakov, Optimal aggregation of classifiers in statistical learning, The Annals of Statistics, vol.32, issue.1, pp.135-166, 2004.
DOI : 10.1214/aos/1079120131

URL : https://hal.archives-ouvertes.fr/hal-00102142

A. B. Tsybakov and S. A. Van-de-geer, Square root penalty: Adaptation to the margin in classification and in edge estimation, The Annals of Statistics, vol.33, issue.3, pp.1203-1224, 2005.
DOI : 10.1214/009053604000001066

URL : https://hal.archives-ouvertes.fr/hal-00101837

V. N. Vapnik, Statistical Learning Theory, 1998.

V. N. Vapnik and A. Chervonenkis, Teoriya raspoznavaniya obrazov. Statisticheskie problemy obucheniya. Izdat, Nauka, 1974.

. Wernecke, . Possinger, and S. Kalb, Validating Classification Trees, Biometrical Journal, vol.40, issue.8, pp.993-1005, 1998.
DOI : 10.1002/(SICI)1521-4036(199812)40:8<993::AID-BIMJ993>3.0.CO;2-T