F. Bach, Consistency of the group Lasso and multiple kernel learning, Journal of Machine Learning Research, vol.9, pp.1179-1225, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00164735

F. Bach, Exploring large feature spaces with hierarchical multiple kernel learning, Advances in Neural Information Processing Systems (NIPS), 2008.
URL : https://hal.archives-ouvertes.fr/hal-00319660

F. Bach and M. I. Jordan, Predictive low-rank decomposition for kernel methods, Proceedings of the 22nd international conference on Machine learning , ICML '05, 2005.
DOI : 10.1145/1102351.1102356

F. Bach, G. R. Lanckriet, and M. I. Jordan, Multiple kernel learning, conic duality, and the SMO algorithm, Twenty-first international conference on Machine learning , ICML '04, 2004.
DOI : 10.1145/1015330.1015424

F. Bach, R. Thibaux, and M. I. Jordan, Computing regularization paths for learning multiple kernels, Advances in Neural Information Processing Systems (NIPS), 2004.

R. Baraniuk, Compressive sensing, 2008 42nd Annual Conference on Information Sciences and Systems, pp.118-121, 2007.
DOI : 10.1109/CISS.2008.4558479

URL : https://hal.archives-ouvertes.fr/hal-00452261

A. Berlinet and C. Thomas-agnan, Reproducing Kernel Hilbert Spaces in Probability and Statistics, 2003.
DOI : 10.1007/978-1-4419-9096-9

P. J. Bickel, Y. Ritov, and A. Tsybakov, Simultaneous analysis of Lasso and Dantzig selector, The Annals of Statistics, vol.37, issue.4, 2009.
DOI : 10.1214/08-AOS620

URL : https://hal.archives-ouvertes.fr/hal-00401585

C. L. Blake and C. J. Merz, UCI repository of machine learning databases, 1998.

J. F. Bonnans and A. Shapiro, Perturbation analysis of optimization problems, 2000.
DOI : 10.1007/978-1-4612-1394-9

J. M. Borwein and A. S. Lewis, Convex Analysis and Nonlinear Optimization. Number 3 in CMS Books in Mathematics, 2000.

S. Boyd and L. Vandenberghe, Convex Optimization, 2003.

L. Breiman, Random forests, Machine Learning, vol.45, issue.1, pp.5-32, 2001.
DOI : 10.1023/A:1010933404324

L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees, 1984.

H. Brezis, Analyse Fonctionelle, 1980.

P. J. Cameron, Combinatorics: Topics, Techniques, Algorithms, 1994.

E. Candès and M. Wakin, An Introduction To Compressive Sampling, IEEE Signal Processing Magazine, vol.25, issue.2, pp.21-30, 2008.
DOI : 10.1109/MSP.2007.914731

O. Chapelle and A. Rakotomamonjy, Second order optimization of kernel parameters, NIPS Workshop on Kernel Learning, 2008.

J. B. Conway, A Course in Functional Analysis, 1997.

M. Cuturi and K. Fukumizu, Kernels on structured objects through nested histograms, Advances in Neural Information Processing Systems (NIPS), 2006.

A. Aspremont, E. L. Ghaoui, M. I. Jordan, and G. R. Lanckriet, A Direct Formulation for Sparse PCA Using Semidefinite Programming, SIAM Review, vol.49, issue.3, pp.434-482, 2007.
DOI : 10.1137/050645506

L. Devroye, L. Györfi, and G. Lugosi, A Probabilistic Theory of Pattern Recognition, 1996.
DOI : 10.1007/978-1-4612-0711-5

B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, Least angle regression, Annals of Statistics, vol.32, p.407, 2004.

S. Fine and K. Scheinberg, Efficient SVM training using low-rank kernel representations, Journal of Machine Learning Research, vol.2, pp.243-264, 2001.

Y. Freund and R. E. Schapire, A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting, Journal of Computer and System Sciences, vol.55, issue.1, pp.119-139, 1997.
DOI : 10.1006/jcss.1997.1504

J. H. Friedman, machine., The Annals of Statistics, vol.29, issue.5, pp.1189-1232, 2001.
DOI : 10.1214/aos/1013203451

J. H. Friedman, Multivariate Adaptive Regression Splines, The Annals of Statistics, vol.19, issue.1, pp.1-67, 1991.
DOI : 10.1214/aos/1176347963

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.382.970

K. Fukumizu, F. Bach, and A. Gretton, Statistical convergence of kernel canonical correlation analysis, Journal of Machine Learning Research, vol.8, issue.8, 2007.

M. Grant and S. Boyd, CVX: Matlab software for disciplined convex programming, 2008.

K. Grauman and T. Darrell, The pyramid match kernel: Efficient learning with sets of features, Journal of Machine Learning Research, vol.8, pp.725-760, 2007.

C. Gu, Smoothing Spline ANOVA Models, 2002.

Z. Harchaoui, F. Bach, and E. Moulines, Testing for homogeneity with kernel Fisher discriminant analysis, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00270806

T. J. Hastie and R. J. Tibshirani, Generalized Additive Models, 1990.

J. Huang, T. Zhang, and D. Metaxas, Learning with structured sparsity, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009.
DOI : 10.1145/1553374.1553429

R. Jenatton, J. Audibert, and F. Bach, Structured variable selection with sparsity-inducing norms, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00377732

V. Koltchinskii and M. Yuan, Sparse recovery in large ensembles of kernel machines, Proceedings of the Conference on Learning Theory (COLT), 2008.

G. R. Lanckriet, T. De-bie, N. Cristianini, M. I. Jordan, and W. S. Noble, A statistical framework for genomic data fusion, Bioinformatics, vol.20, issue.16, pp.2626-2635, 2004.
DOI : 10.1093/bioinformatics/bth294

G. R. Lanckriet, N. Cristianini, L. Ghaoui, P. Bartlett, and M. I. Jordan, Learning the kernel matrix with semidefinite programming, Journal of Machine Learning Research, vol.5, pp.27-72, 2004.

H. Lee, A. Battle, R. Raina, and A. Ng, Efficient sparse coding algorithms, Advances in Neural Information Processing Systems (NIPS), 2007.

C. Lemaréchal and C. Sagastizábal, Practical Aspects of the Moreau--Yosida Regularization: Theoretical Preliminaries, SIAM Journal on Optimization, vol.7, issue.2, pp.867-895, 1997.
DOI : 10.1137/S1052623494267127

Y. Lin and H. H. Zhang, Component selection and smoothing in multivariate nonparametric regression, The Annals of Statistics, vol.34, issue.5, pp.2272-2297, 2006.
DOI : 10.1214/009053606000000722

URL : http://arxiv.org/abs/math/0702659

M. S. Lobo, L. Vandenberghe, S. Boyd, and H. Lébret, Applications of second-order cone programming, Linear Algebra and its Applications, vol.284, issue.1-3, pp.193-228, 1998.
DOI : 10.1016/S0024-3795(98)10032-0

H. Lodhi, C. Saunders, J. Shawe-taylor, N. Cristianini, and C. Watkins, Text classification using string kernels, Journal of Machine Learning Research, vol.2, pp.419-444, 2002.

G. Loosli, S. Canu, S. Vishwanathan, A. Smola, and M. Chattopadhyay, Bo??tèBo??tè a outils SVM simple et rapide. Revue d, Intelligence Artificielle, vol.19, pp.4-5741, 2005.

K. Lounici, M. Pontil, A. B. Tsybakov, and S. A. Van-de-geer, Taking advantage of sparsity in multi-task learning, Proceedings of the twenty-second Annual Conference on Learning Theory (COLT), 2009.

P. Massart, Concentration Inequalities and Model Selection: Ecole d'´ eté de Probabilités de Saint- Flour 23, 2003.

C. A. Micchelli, Y. Xu, and H. Zhang, Universal kernels, Journal of Machine Learning Research, vol.7, pp.2651-2667, 2006.

Y. Nardi and A. Rinaldo, On the asymptotic properties of the group lasso estimator for linear models, Electronic Journal of Statistics, vol.2, issue.0, pp.605-633, 2008.
DOI : 10.1214/08-EJS200

G. Obozinski, B. Taskar, and M. I. Jordan, Joint covariate selection and joint subspace selection for multiple classification problems, Statistics and Computing, vol.8, issue.68, 2009.
DOI : 10.1007/s11222-008-9111-x

B. A. Olshausen and D. J. Field, Sparse coding with an overcomplete basis set: A strategy employed by V1?, Vision Research, vol.37, issue.23, pp.3311-3325, 1997.
DOI : 10.1016/S0042-6989(97)00169-7

C. S. Ong, A. J. Smola, and R. C. Williamson, Learning the kernel with hyperkernels, Journal of Machine Learning Research, vol.6, pp.1043-1071, 2005.

J. Platt, Fast training of support vector machines using sequential minimal optimization, Advances in Kernel Methods: Support Vector Learning, 1998.

M. Pontil and C. A. Micchelli, Learning the kernel function via regularization, Journal of Machine Learning Research, vol.6, pp.1099-1125, 2005.

C. E. Rasmussen and C. Williams, Gaussian Processes in Machine Learning, 2006.
DOI : 10.1162/089976602317250933

P. Ravikumar, H. Liu, J. Lafferty, and L. Wasserman, SpAM: Sparse additive models, Advances in Neural Information Processing Systems (NIPS), 2008.
DOI : 10.1111/j.1467-9868.2009.00718.x

R. T. Rockafellar, Convex Analysis, 1970.
DOI : 10.1515/9781400873173

V. Roth, The Generalized LASSO, IEEE Transactions on Neural Networks, vol.15, issue.1, pp.16-28, 2004.
DOI : 10.1109/TNN.2003.809398

V. Roth and B. Fischer, The Group-Lasso for generalized linear models, Proceedings of the 25th international conference on Machine learning, ICML '08, 2008.
DOI : 10.1145/1390156.1390263

B. Schölkopf and A. J. Smola, Learning with Kernels, 2002.

J. Shawe-taylor and N. Cristianini, Kernel Methods for Pattern Analysis, 2004.
DOI : 10.1017/CBO9780511809682

S. Sonnenburg, G. Rätsch, C. Schäfer, and B. Schölkopf, Large scale multiple kernel learning, Journal of Machine Learning Research, vol.7, pp.1531-1565, 2006.

N. Srebro and S. Ben-david, Learning Bounds for Support Vector Machines with Learned Kernels, Proceedings of the Conference on Learning Theory (COLT), 2006.
DOI : 10.1007/11776420_15

I. Steinwart, On the influence of the kernel on the consistency of support vector machines, Journal of Machine Learning Research, vol.2, pp.67-93, 2002.

M. Szafranski, Y. Grandvalet, and A. Rakotomamonjy, Composite kernel learning, Proceedings of the 25th international conference on Machine learning, ICML '08, 2008.
DOI : 10.1145/1390156.1390287

URL : https://hal.archives-ouvertes.fr/hal-00316016

R. Tibshirani, Regression shrinkage and selection via the Lasso, Journal of The Royal Statistical Society Series B, vol.58, issue.1, pp.267-288, 1996.

M. Varma and B. R. Babu, More generality in efficient multiple kernel learning, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009.
DOI : 10.1145/1553374.1553510

G. Wahba, Spline Models for Observational Data, SIAM, 1990.
DOI : 10.1137/1.9781611970128

M. J. Wainwright, Sharp thresholds for noisy and high-dimensional recovery of sparsity using ? 1 constrained quadratic programming (Lasso), IEEE Transactions on Information Theory, 2009.

C. K. Williams and M. Seeger, The effect of the input density distribution on kernel-based classifiers, Proceedings of the International Conference on Machine Learning (ICML), 2000.

T. T. Wu and K. Lange, Coordinate descent algorithms for lasso penalized regression, The Annals of Applied Statistics, vol.2, issue.1, pp.224-244, 2008.
DOI : 10.1214/07-AOAS147SUPP

Y. Ying and C. Campbell, Generalization bounds for learning the kernel, Proceedings of the twenty-second Annual Conference on Learning Theory (COLT), 2009.

M. Yuan and Y. Lin, Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.58, issue.1, pp.49-67, 2006.
DOI : 10.1198/016214502753479356

M. Yuan and Y. Lin, On the non-negative garrotte estimator, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.101, issue.2, pp.143-161, 2007.
DOI : 10.1111/j.1467-9868.2005.00503.x

T. Zhang, Some sharp performance bounds for least squares regression with L 1 regularization, The Annals of Statistics, vol.37, issue.5A, 2009.
DOI : 10.1214/08-AOS659

T. Zhang, On the consistency of feature selection using greedy least squares regression, Journal of Machine Learning Research, vol.10, pp.555-568, 2009.

P. Zhao and B. Yu, On model selection consistency of Lasso, Journal of Machine Learning Research, vol.7, pp.2541-2563, 2006.

P. Zhao, G. Rocha, and B. Yu, Grouped and hierarchical model selection through composite absolute penalties, Annals of Statistics, 2009.

H. Zou, The Adaptive Lasso and Its Oracle Properties, Journal of the American Statistical Association, vol.101, issue.476, pp.1418-1429, 2006.
DOI : 10.1198/016214506000000735

H. Zou and T. Hastie, Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.5, issue.2, pp.301-320, 2005.
DOI : 10.1073/pnas.201162998