A. Agarwal, A. Beygelzimer, D. Hsu, J. Langford, and M. Telgarsky, Scalable nonlinear learning with adaptive polynomial expansions, Advances in Neural Information Processing Systems, vol.3, pp.2051-2059, 2014.

J. A. Anderson, Quadratic logistic discrimination, Biometrika, vol.62, pp.149-154, 1975.

F. Bach, R. Jenatton, J. Mairal, and G. Obozinski, Optimization with sparsity-inducing penalties, Foundations and Trends in Machine Learning, vol.4, issue.1, pp.1-106, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00613125

F. Bach, R. Jenatton, J. Mairal, and G. Obozinski, Structured sparsity through convex optimization, Statistical Science, vol.27, issue.4, pp.450-468, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00621245

H. Bauschke and P. Combettes, Convex analysis and monotone operator theory in hilbert spaces, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00643354

A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM Journal on Imaging Sciences, vol.2, issue.1, pp.183-202, 2009.

J. Bien, J. Taylor, and R. Tibshirani, A lasso for hierarchical interactions, Annals of Statistics, vol.41, issue.3, pp.1111-1141, 2013.

M. Blondel, K. Seki, and K. Uehara, Block coordinate descent algorithms for large-scale sparse multiclass classification, Machine Learning, vol.93, issue.1, pp.31-52, 2013.

J. Bruna and S. Mallat, Invariant scattering convolution networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, issue.8, pp.1872-1886, 2013.

R. Chakraborty and N. R. Pal, Feature selection using a neural framework with controlled redundancy, IEEE Transactions on Neural Networks and Learning Systems, vol.26, issue.1, pp.35-50, 2015.

A. Chambolle and C. Dossal, On the convergence of the iterates of the "Fast Iterative Shrinkage/Thresholding Algorithm, Journal of Optimization Theory and Applications, vol.166, issue.3, pp.968-982, 2015.

A. Chambolle and T. Pock, A first-order primal-dual algorithm for convex problems with applications to imaging, Journal of Mathematical Imaging and Vision, vol.40, issue.1, pp.120-145, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00490826

C. C. Chang and C. J. Lin, Libsvm: A library for support vector machines, ACM Transactions on Intelligent Systems and Technology, vol.2, pp.1-27, 2011.

C. Chaux, J. C. Pesquet, and N. Pustelnik, Nested iterative algorithms for convex constrained image recovery problem, SIAM Journal on Imaging Sciences, vol.2, issue.2, pp.730-762, 2009.

G. Chierchia, N. Pustelnik, J. C. Pesquet, and B. Pesquet-popescu, Epigraphical projection and proximal tools for solving constrained convex optimization problems. Signal, Image and Video Processing, vol.9, pp.1737-1749, 2015.
URL : https://hal.archives-ouvertes.fr/hal-00744603

G. Chierchia, N. Pustelnik, J. C. Pesquet, and B. Pesquet-popescu, A proximal approach for sparse multiclass SVM, 2015.
URL : https://hal.archives-ouvertes.fr/hal-02287007

P. Combettes and J. C. Pesquet, Proximal splitting methods in signal processing. Fixed-Point Algorithms for, Inverse Problems in Science and Engineering, vol.49, pp.185-212, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00643807

L. Condat, A primal-dual splitting method for convex optimization involving Lipschitzian, proximable and linear composite terms, Journal of Optimization Theory and Applications, vol.158, issue.2, pp.460-479, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00609728

J. C. Duchi, S. Shai, Y. Singer, and T. Chandra, Efficient projections onto the l1-ball for learning in high dimensions, ICML, ACM International Conference Proceeding Series, vol.307, pp.272-279, 2008.

P. Flandrin, Time-frequency/time-scale analysis, 1999.

J. Gui, Z. Sun, J. S. Member, S. Tao, D. Tan et al., Feature selection based on structured sparsity : a comprehensive study, IEEE Transactions on Neural Networks and Learning Systems, vol.28, issue.7, pp.1-18, 2016.

N. Hao, Y. Feng, and H. H. Zhang, Model selection for high-dimensional quadratic regression via regularization, Journal of the American Statistical Association, vol.113, issue.522, pp.615-625, 2018.

A. Haris, D. Witten, and N. Simon, Convex modeling of interactions with strong heredity, Journal of Computational and Graphical Statistics, pp.1-31, 2014.

R. Jenatton, J. Mairal, G. Obozinski, and F. Bach, Proximal methods for hierarchical sparse coding, Journal of Machine Learning Research, vol.12, pp.2297-2334, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00516723

M. Jiu, N. Pustelnik, and L. Qi, Multiclass SVM with hierarchical interaction: application to face classification, 26th IEEE International Workshop on Machine Learning for Signal Processing, pp.1-6, 2018.

N. Komodakis and J. C. Pesquet, Playing with duality: an overview of recent primaldual approaches for solving large-scale optimization problems, IEEE Signal Processing Magazine, vol.32, issue.6, pp.31-54, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01010437

L. Laporte, R. Flamary, S. Canu, S. Dejean, and J. Mothe, Nonconvex regularizations for feature selection in ranking with sparse SVM, IEEE Transactions on Neural Networks and Learning Systems, vol.25, issue.6, pp.1118-1130, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01123818

Y. Lecun and Y. Bengio, Convolutional networks for images, speech, and time series. The handbook of brain theory, p.3361, 1995.

M. Lim and T. Hastie, Learning interactions through hierarchical group-lasso regularization, Journal of Computational and Graphical Statistics, vol.24, pp.627-654, 2015.

B. Pascal, N. Pustelnik, P. Abry, M. Serres, and V. Vidal, Joint Estimation Of Local Variance And Local Regularity For Texture Segmentation. Application To Multiphase Flow Characterization, 25th IEEE International Conference On Image Processing (ICIP), pp.2092-2096, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01818082

A. Pirayre, C. Couprie, F. Bidard, L. Duval, and J. C. Pesquet, BRANE Cut: Biologicallyrelated apriori network enhancement with graph cuts for gene regulatory network inference, BMC Bioinformatics, vol.16, issue.1, 2015.

S. Y. Rhee, J. Taylor, G. Wadhera, B. Brutlag, D. L. Shafer et al., Genotypic predictors of human immunodeficiency virus type 1 drug resistance, Proceedings of the National Academy of Sciences, vol.103, issue.46, p.360, 2006.

H. R. Ronald, D. B. James, S. R. John, and V. H. Robert, Generalized linear and quadratic discriminant functions using robust estimates, Journal of the American Statistical Association, vol.73, pp.564-568, 1978.

E. Sakar, B. Isenkul, M. Sakar, C. Sertbas, A. Gurgen et al., Collection and analysis of a parkinson speech dataset with multiple types of sound recordings, IEEE Journal of Biomedical and Health Informatics, vol.17, issue.4, pp.828-834, 2013.

M. Schmidt, N. L. Roux, and F. Bach, Convergence rates of inexact proximal-gradient methods for convex optimization, Advances in Neural Information Processing Systems, vol.24, pp.1458-1466, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00618152

S. Setzer, Split Bregman algorithm, Douglas-Rachford splitting and frame shrinkage, chap Scale Space and Variational Methods in Computer Vision. SSVM, vol.5567, pp.464-476, 2009.

Y. She and H. Jiang, Group regularized estimation under structural hierarchy, Journal of the American Statistical Association, 2016.

J. Spilka, J. Frecon, R. Leonarduzzi, N. Pustelnik, P. Abry et al., Sparse Support Vector Machine for Intrapartum Fetal Heart Rate Classification, IEEE Journal Of Biomedical And Health Informatics, vol.21, issue.3, pp.664-671, 2017.

R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society, Series B, vol.58, pp.267-288, 1994.

B. C. V?, A splitting algorithm for dual monotone inclusions involving cocoercive operators, Advances in Computational Mathematics, vol.38, issue.3, pp.667-681, 2013.

J. Weston, A. Elisseeff, B. Scholkopf, and M. Tipping, Use of the zero-norm with linear models and kernel methods, Journal of Machine Learning Research, vol.3, pp.1439-1461, 2003.

D. M. Witten and R. Tibshirani, Covariance-regularized regression and classification for high dimensional problems, Journal of the Royal Statistical Society Series B: Statistical Methodology, vol.71, issue.3, pp.615-636, 2009.

J. Xu, B. Tang, H. He, and H. Man, Semisupervised feature selection based on relevance and redundancy criteria, IEEE Transactions on Neural Networks and Learning Systems, vol.28, issue.9, pp.1974-1984, 2017.

P. Zhao, G. Rocha, and B. Yu, The composite absolute penalties family for groupes and hierarchical variable selection, The Annals of Statistics, vol.37, issue.6A, pp.3468-3497, 2009.

H. Zou and M. Yuan, The F?-norm support vector machine, Statistica Sinica, vol.18, pp.379-398, 2008.