F. Bach, R. Jenatton, J. Mairal, and G. Obozinski, Convex optimization with sparsityinducing norms. Foundations and Trends in Machine Learning, vol.4, pp.1-106, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00937150

H. H. Bauschke and P. L. Combettes, Convex analysis and monotone operator theory in Hilbert spaces, 2011.
URL : https://hal.archives-ouvertes.fr/hal-01517477

S. Behnel, R. Bradshaw, C. Citro, L. Dalcin, D. S. Seljebotn et al., The best of both worlds. Computing in Science Engineering, vol.13, pp.31-39, 2011.

A. Belloni, V. Chernozhukov, and L. Wang, Square-root Lasso: pivotal recovery of sparse signals via conic programming, Biometrika, vol.98, issue.4, pp.791-806, 2011.

A. Boisbunon, R. Flamary, and A. Rakotomamonjy, Active set strategy for high-dimensional non-convex sparse optimization problems, ICASSP, pp.1517-1521, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01025585

R. Bollapragada, D. Scieur, and A. , Nonlinear Acceleration of Momentum and Primal-Dual Algorithms, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01893921

A. Bonnefoy, V. Emiya, L. Ralaivola, and R. Gribonval, A dynamic screening principle for the lasso, EUSIPCO, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00880787

S. Boyd and L. Vandenberghe, Convex optimization, 2004.

L. Buitinck, G. Louppe, M. Blondel, F. Pedregosa, A. Mueller et al., , 2013.

S. S. Chen and D. L. Donoho, Atomic decomposition by basis pursuit, SPIE, 1995.

L. E. Ghaoui, V. Viallon, and T. Rabbani, Safe feature elimination in sparse supervised learning, J. Pacific Optim, vol.8, issue.4, pp.667-698, 2012.

R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin, Liblinear: A library for large linear classification, J. Mach. Learn. Res, vol.9, pp.1871-1874, 2008.

J. Fan and J. Lv, Sure independence screening for ultrahigh dimensional feature space, J. R. Stat. Soc. Ser. B Stat. Methodol, vol.70, issue.5, pp.849-911, 2008.

O. Fercoq and P. Richtárik, Accelerated, parallel and proximal coordinate descent, SIAM J. Optim, vol.25, issue.3, pp.1997-2013, 2015.

O. Fercoq, A. Gramfort, and J. Salmon, Mind the duality gap: safer rules for the lasso, ICML, pp.333-342, 2015.

J. Friedman, T. J. Hastie, H. Höfling, and R. Tibshirani, Pathwise coordinate optimization, Ann. Appl. Stat, vol.1, issue.2, pp.302-332, 2007.

J. Friedman, T. J. Hastie, and R. Tibshirani, Regularization paths for generalized linear models via coordinate descent, J. Stat. Softw, vol.33, issue.1, p.1, 2010.

A. Gramfort, M. Kowalski, and M. Hämäläinen, Mixed-norm estimates for the M/EEG inverse problem using accelerated gradient methods, Phys. Med. Biol, vol.57, issue.7, pp.1937-1961, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00690774

A. Gramfort, M. Luessi, E. Larson, D. A. Engemann, D. Strohmeier et al., MNE software for processing MEG and EEG data, NeuroImage, vol.86, pp.446-460, 2014.

E. Hale, W. Yin, and Y. Zhang, Fixed-point continuation for 1 -minimization: Methodology and convergence, SIAM J. Optim, vol.19, issue.3, pp.1107-1130, 2008.

J. Hiriart-urruty and C. , Convex analysis and minimization algorithms, vol.II, 1993.

C. Hsieh, M. Sustik, I. Dhillon, and P. Ravikumar, QUIC: Quadratic approximation for sparse inverse covariance estimation, J. Mach. Learn. Res, vol.15, pp.2911-2947, 2014.

T. B. Johnson and C. Guestrin, Blitz: A principled meta-algorithm for scaling sparse optimization, ICML, pp.1171-1179, 2015.

T. B. Johnson and C. Guestrin, A fast, principled working set algorithm for exploiting piecewise linear structure in convex problems, 2018.

P. Karimireddy, A. Koloskova, S. Stich, and M. Jaggi, Efficient Greedy Coordinate Descent for Composite Problems, 2018.

K. Koh, S. Kim, and S. Boyd, An interior-point method for large-scale l1-regularized logistic regression, J. Mach. Learn. Res, vol.8, issue.8, pp.1519-1555, 2007.

M. Kowalski, P. Weiss, A. Gramfort, and S. Anthoine, Accelerating ISTA with an active set strategy, OPT 2011: 4th International Workshop on Optimization for Machine Learning, p.7, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00696992

S. K. Lam, A. Pitrou, and S. Seibert, Numba: A LLVM-based Python JIT Compiler, Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC, pp.1-6, 2015.

J. Lee, Y. Sun, and M. Saunders, Proximal Newton-type methods for convex optimization, NIPS, pp.827-835, 2012.

J. , Sparse coding for machine learning, image processing and computer vision, 2010.

M. Massias, A. Gramfort, and J. Salmon, From safe screening rules to working sets for faster lasso-type solvers, 10th NIPS Workshop on Optimization for Machine Learning, 2017.

M. Massias, A. Gramfort, and J. Salmon, Celer: a fast solver for the Lasso with dual extrapolation, ICML, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01833398

P. Mccullagh and J. A. Nelder, Generalized Linear Models, CRC Monographs on Statistics and Applied Probability Series, 1989.

D. Myers and W. Shih, A constraint selection technique for a class of linear programs, Operations Research Letters, vol.7, issue.4, pp.191-195, 1988.

E. Ndiaye, O. Fercoq, A. Gramfort, and J. Salmon, Gap safe screening rules for sparse multi-task and multi-class models, NIPS, pp.811-819, 2015.

E. Ndiaye, O. Fercoq, A. Gramfort, and J. Salmon, GAP safe screening rules for sparsegroup-lasso, NIPS, 2016.

E. Ndiaye, O. Fercoq, A. Gramfort, and J. Salmon, Gap safe screening rules for sparsity enforcing penalties, J. Mach. Learn. Res, vol.18, issue.128, pp.1-33, 2017.

J. Nutini, M. Schmidt, and W. Hare, Active-set complexity" of proximal gradient: how long does it take to find the sparsity pattern? Optimization Letters, pp.1-11, 2017.

G. Obozinski, B. Taskar, and M. I. Jordan, Joint covariate selection and joint subspace selection for multiple classification problems, Statistics and Computing, vol.20, issue.2, pp.231-252, 2010.

K. Ogawa, Y. Suzuki, and I. Takeuchi, Safe screening of non-support vectors in pathwise SVM computation, ICML, pp.1382-1390, 2013.

F. Palacios-gomez, L. Lasdon, and M. Engquist, Nonlinear optimization by successive linear programming, Management Science, vol.28, issue.10, pp.1106-1120, 1982.

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion et al., Scikit-learn: Machine learning in Python, J. Mach. Learn. Res, vol.12, pp.2825-2830, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00650905

D. Perekrestenko, V. Cevher, and M. Jaggi, Faster coordinate descent via adaptive importance sampling, AISTATS, pp.869-877, 2017.

C. Poon, J. Liang, and C. Schoenlieb, Local convergence properties of SAGA/Prox-SVRG and acceleration, ICML, pp.4124-4132, 2018.

P. Richtárik and M. Taká?, Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function, Mathematical Programming, vol.144, issue.1-2, pp.1-38, 2014.

S. Rosset, J. Zhu, and T. Hastie, Boosting as a regularized path to a maximum margin classifier, J. Mach. Learn. Res, vol.5, pp.941-973, 2004.

V. Roth and B. Fischer, The group-lasso for generalized linear models: uniqueness of solutions and efficient algorithms, ICML, pp.848-855, 2008.

M. De-santis, S. Lucidi, and F. Rinaldi, A fast active set block coordinate descent algorithm for 1 -regularized least squares, SIAM J. Optim, vol.26, issue.1, pp.781-809, 2016.

K. Scheinberg and X. Tang, Complexity of inexact proximal Newton methods, 2013.

D. Scieur, Acceleration in Optimization, 2018.
URL : https://hal.archives-ouvertes.fr/tel-01887163

D. Scieur, A. Aspremont, and F. Bach, Regularized nonlinear acceleration, NIPS, pp.712-720, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01384682

N. Simon, J. Friedman, T. J. Hastie, and R. Tibshirani, A sparse-group lasso, J. Comput. Graph. Statist, vol.22, issue.2, pp.231-245, 2013.

Y. Sun, H. Jeong, J. Nutini, and M. Schmidt, Are we there yet? manifold identification of gradient-related proximal methods, AISTATS, pp.1110-1119, 2019.

G. Thompson, F. Tonge, and S. Zionts, Techniques for removing nonbinding constraints and extraneous variables from linear programming problems, Management Science, vol.12, issue.7, pp.588-608, 1966.

R. Tibshirani, Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Ser. B Stat. Methodol, vol.58, issue.1, pp.267-288, 1996.

R. Tibshirani, J. Bien, J. Friedman, T. J. Hastie, N. Simon et al., Strong rules for discarding predictors in lasso-type problems, J. R. Stat. Soc. Ser. B Stat. Methodol, vol.74, issue.2, pp.245-266, 2012.

R. J. Tibshirani, The lasso problem and uniqueness, Electron. J. Stat, vol.7, pp.1456-1490, 2013.

R. J. Tibshirani, Dykstra's Algorithm, ADMM, and Coordinate Descent: Connections, Insights, and Extensions, NIPS, pp.517-528, 2017.

P. Tseng, Convergence of a block coordinate descent method for nondifferentiable minimization, J. Optim. Theory Appl, vol.109, issue.3, pp.475-494, 2001.

S. Vaiter, G. Peyré, and J. M. Fadili, Model consistency of partly smooth regularizers, IEEE Trans. Inf. Theory, vol.64, issue.3, pp.1725-1737, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01658847

J. Wang, P. Wonka, and J. Ye, Lasso screening rules via dual polytope projection, 2012.

Z. J. Xiang, Y. Wang, and P. J. Ramadge, Screening tests for lasso problems, IEEE Trans. Pattern Anal. Mach. Intell, p.99, 2016.

C. Yuan and C. Hho, An improved GLMNET for l1-regularized logistic regression, J. Mach. Learn. Res, vol.13, 1999.

M. Yuan and Y. Lin, Model selection and estimation in regression with grouped variables, J. R. Stat. Soc. Ser. B Stat. Methodol, vol.68, issue.1, pp.49-67, 2006.