R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society, vol.58, issue.1, pp.267-288, 1996.

S. Chen, D. Donoho, and M. Saunders, Atomic Decomposition by Basis Pursuit, SIAM Journal on Scientific Computing, vol.20, issue.1, pp.33-61, 1999.
DOI : 10.1137/S1064827596304010

Y. Li and S. Amari, Two conditions for equivalence of 0-norm solution and 1-norm solution in sparse representation, Neural Networks IEEE Transactions on, vol.21, issue.7, pp.1189-1196, 2010.

D. Donoho, For most large underdetermined systems of linear equations the minimal ???1-norm solution is also the sparsest solution, Communication in Pure and Applied Mathematics, pp.797-829, 2006.
DOI : 10.1002/cpa.20132

S. Shevade and S. Keerthi, A simple and efficient algorithm for gene selection using sparse logistic regression, Bioinformatics, vol.19, issue.17, pp.2246-2253, 2003.
DOI : 10.1093/bioinformatics/btg308

A. Beck and M. Teboulle, A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems, SIAM Journal on Imaging Sciences, vol.2, issue.1, pp.183-202, 2009.
DOI : 10.1137/080716542

G. Yuan, C. Ho, and C. Lin, An improved GLMNET for l1-regularized logistic regression, Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '11, pp.1999-2030, 2013.
DOI : 10.1145/2020408.2020421

F. Bach, R. Jenatton, J. Mairal, and G. Obozinski, Convex optimization with sparsity-inducing norms, Optimization for Machine Learning, 2011.
DOI : 10.1561/2200000015

URL : https://hal.archives-ouvertes.fr/hal-00937150

H. Zou, The Adaptive Lasso and Its Oracle Properties, Journal of the American Statistical Association, vol.101, issue.476, pp.1418-1429, 2006.
DOI : 10.1198/016214506000000735

J. Fan and R. Li, Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties, Journal of the American Statistical Association, vol.96, issue.456, pp.1348-1360, 2001.
DOI : 10.1198/016214501753382273

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.128.4174

K. Knight and W. Fu, Asymptotics for lasso-type estimators, Annals of Statistics, vol.28, issue.5, pp.1356-1378, 2000.

E. Candès, M. Wakin, and S. Boyd, Enhancing Sparsity by Reweighted ??? 1 Minimization, Journal of Fourier Analysis and Applications, vol.7, issue.3, pp.877-905, 2008.
DOI : 10.1007/s00041-008-9045-x

L. Laporte, R. Flamary, S. Canu, S. Dejean, and J. Mothe, Nonconvex Regularizations for Feature Selection in Ranking With Sparse SVM, IEEE Transactions on Neural Networks and Learning Systems, vol.25, issue.6, pp.1118-1130, 2014.
DOI : 10.1109/TNNLS.2013.2286696

URL : https://hal.archives-ouvertes.fr/hal-00905550

G. Gasso, A. Rakotomamonjy, and S. Canu, Recovering Sparse Signals With a Certain Family of Nonconvex Penalties and DC Programming, IEEE Transactions on Signal Processing, vol.57, issue.12, pp.4686-4698, 2009.
DOI : 10.1109/TSP.2009.2026004

URL : https://hal.archives-ouvertes.fr/hal-00439453

P. L. Combettes and J. Pesquet, Proximal splitting methods in signal processing, " in Fixed-Point Algorithms for Inverse Problems in Science and Engineering, pp.185-212, 2011.

J. Lee, Y. Sun, and M. Saunders, Proximal newton-type methods for convex optimization, Advances in Neural Information Processing Systems, pp.836-844, 2012.

S. Becker and J. Fadili, A quasi-newton proximal splitting method, Advances in Neural Information Processing Systems, pp.2618-2626, 2012.
URL : https://hal.archives-ouvertes.fr/hal-01080081

H. A. Le-thi and T. Pham-dinh, The dc (difference of convex functions) programming and dca revisited with dc models of real world nonconvex optimization problems, Annals of Operations Research, vol.133, issue.14, pp.23-46, 2005.

T. , P. Dinh, and H. A. Le-thi, Convex analysis approach to dc programming: Theory, algorithms and applications, Acta Mathematica Vietnamica, vol.22, issue.1, pp.287-355, 1997.

F. Akoa, Combining DC Algorithms (DCAs) and Decomposition Techniques for the Training of Nonpositive–Semidefinite Kernels, IEEE Transactions on Neural Networks, vol.19, issue.11, pp.1854-1872, 2008.
DOI : 10.1109/TNN.2008.2003299

R. Jenatton, J. Mairal, G. Obozinski, and F. Bach, Proximal methods for sparse hierarchical dictionary learning, Proceedings of International Conference on Machine Learning, pp.487-494, 2010.

A. Rakotomamonjy, Direct Optimization of the Dictionary Learning Problem, IEEE Transactions on Signal Processing, vol.61, issue.22, pp.5495-5506, 2013.
DOI : 10.1109/TSP.2013.2278158

URL : https://hal.archives-ouvertes.fr/hal-00850248

N. Srebro, J. Rennie, and T. S. Jaakkola, Maximum-margin matrix factorization, " in Advances in neural information processing systems, no, pp.1329-1336, 2004.

S. Ertekin, L. Bottou, and C. Giles, Nonconvex Online Support Vector Machines, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.33, issue.2, pp.368-381, 2011.
DOI : 10.1109/TPAMI.2010.109

R. Collobert, F. Sinz, J. Weston, and L. Bottou, Trading convexity for scalability, Proceedings of the 23rd international conference on Machine learning , ICML '06, pp.201-208, 2006.
DOI : 10.1145/1143844.1143870

A. L. Yuille, A. Rangarajan, and A. Yuille, The concave-convex procedure (cccp) Advances in neural information processing systems, pp.1033-1040, 2002.

N. Courty, R. Flamary, and D. Tuia, Domain Adaptation with Regularized Optimal Transport, Machine Learning and Knowledge Discovery in Databases, pp.274-289, 2014.
DOI : 10.1007/978-3-662-44848-9_18

URL : https://hal.archives-ouvertes.fr/hal-01018698

R. Jenatton, G. Obozinski, and F. Bach, Structured sparse principal component analysis, Proceedings of the International Conference on Artificial Intelligence and Statistics, pp.366-373, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00414158

E. Richard, P. Savalle, and N. Vayatis, Estimation of simultaneously sparse and low rank matrices, Proceedings of the International Conference in Machine Learning. Omnipress, 2012.

Y. Deng, Q. Dai, R. Liu, Z. Zhang, and S. Hu, Low-rank structure learning via nonconvex heuristic recovery, Neural Networks and Learning Systems, pp.383-396, 2013.

K. Zhong, E. Yen, I. S. Dhillon, and P. K. Ravikumar, Proximal quasi-newton for computationally intensive l1-regularized m-estimators, Advances in Neural Information Processing Systems 27, pp.2375-2383, 2014.

M. Figueiredo, R. Nowak, and S. Wright, Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems, IEEE Journal of Selected Topics in Signal Processing, vol.1, issue.4, pp.586-598, 2007.
DOI : 10.1109/JSTSP.2007.910281

G. Golub and C. Van-loan, Matrix computations, 1996.

P. Gong, C. Zhang, Z. Lu, J. Huang, and Y. Jieping, A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems, Proceedings of the 30th International Conference on Machine Learning, pp.37-45, 2013.

Z. Lu, Sequential convex programming methods for a class of structured nonlinear programming ArXiv:1210, 2012.

P. Loh and M. J. Wainwright, Regularized m-estimators with nonconvexity: Statistical and algorithmic theory for local optima, Advances in Neural Information Processing Systems 26, pp.476-484, 2013.

T. , P. Dinh, and H. A. Le-thi, Dc optimization algorithms for solving the trust region subproblem, SIAM Journal of Optimization, vol.8, pp.476-505, 1998.

R. Collobert, F. Sinz, J. Weston, and L. Bottou, Large scale transductive svms, Journal of Machine Learning Research, vol.7, pp.1687-1712, 2006.

H. Mine and M. Fukushima, A minimization method for the sum of a convex function and a continuously differentiable function, Journal of Optimization Theory and Applications, vol.6, issue.5, pp.9-23, 1981.
DOI : 10.1007/BF00935173

E. Chouzenoux, J. Pesquet, and A. Repetti, Variable Metric Forward???Backward Algorithm for Minimizing the Sum of a Differentiable Function and a Convex Function, Journal of Optimization Theory and Applications, vol.21, issue.2, pp.107-132, 2014.
DOI : 10.1007/s10957-013-0465-7

URL : https://hal.archives-ouvertes.fr/hal-00789970

S. Sra, Scalable nonconvex inexact proximal splitting, Advances in Neural Information Processing Systems, pp.530-538, 2012.

H. Attouch, J. Bolte, P. Redont, and A. Soubeyran, Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-??ojasiewicz Inequality, Mathematics of Operations Research, vol.35, issue.2, pp.438-457, 2010.
DOI : 10.1287/moor.1100.0449

H. Attouch, J. Bolte, and B. F. Svaiter, Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward???backward splitting, and regularized Gauss???Seidel methods, Mathematical Programming, pp.91-129, 2013.
DOI : 10.1007/s10107-011-0484-9

URL : https://hal.archives-ouvertes.fr/inria-00636457

J. Bolte, S. Sabach, and M. Teboulle, Proximal alternating linearized minimization for nonconvex and nonsmooth problems, Mathematical Programming, pp.459-494, 2014.
DOI : 10.1007/s10107-013-0701-9

URL : https://hal.archives-ouvertes.fr/hal-00916090

T. Zhang, Analysis of multi-stage convex relaxation for sparse regularization, Journal of Machine Learning Researc, vol.11, pp.1081-1107, 2010.

A. Boisbunon, R. Flamary, and A. Rakotomamonjy, Active set strategy for high-dimensional non-convex sparse optimization problems, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.1517-1521, 2014.
DOI : 10.1109/ICASSP.2014.6853851

URL : https://hal.archives-ouvertes.fr/hal-01025585

A. Rakotomamonjy, R. Flamary, G. Gasso, and S. Canu, <formula formulatype="inline"><tex Notation="TeX">$\ell_{p}-\ell_{q}$</tex></formula> Penalty for Sparse Linear and Sparse Multiple Kernel Multitask Learning, IEEE Transactions on Neural Networks, vol.22, issue.8, pp.1307-1320, 2011.
DOI : 10.1109/TNN.2011.2157521

O. Chapelle and A. Zien, Semi-supervised classification by low density separation, Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistic, pp.57-64, 2005.

T. Joachims, Transductive inference for text classification using svms, Proceedings of The 16th International Conference on Machine Learning, pp.200-209, 1999.