A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in neural information processing systems, pp.1097-1105, 2012.

A. B. Nassif, I. Shahin, I. Attili, M. Azzeh, and K. Shaalan, Speech recognition using deep neural networks: A systematic review, IEEE Access, vol.7, pp.19-143, 2019.

D. W. Otter, J. R. Medina, and J. K. Kalita, A survey of the usages of deep learning in natural language processing, 2018.

K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.770-778, 2016.

A. Canziani, A. Paszke, and E. Culurciello, An analysis of deep neural network models for practical applications, 2016.

A. Ignatov, R. Timofte, W. Chou, K. Wang, M. Wu et al., AI benchmark: Running deep neural networks on android smartphones, pp.0-0, 2018.

S. Rallapalli, H. Qiu, A. Bency, S. Karthikeyan, R. Govindan et al., Are very deep neural networks feasible on mobile devices, IEEE Trans. Circ. Syst. Video Technol, 2016.

S. Han, J. Kang, H. Mao, Y. Hu, X. Li et al., Ese: Efficient speech recognition engine with sparse lstm on fpga, Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp.75-84, 2017.

E. Strubell, A. Ganesh, and A. Mccallum, Energy and policy considerations for deep learning in nlp, ACL, 2019.

R. Schwartz, J. Dodge, N. A. Smith, and O. Etzioni, Green ai, 2019.

M. Denil, B. Shakibi, L. Dinh, and N. D. Freitas, Predicting parameters in deep learning, Advances in neural information processing systems, pp.2148-2156, 2013.

J. Friedman, T. Hastie, and R. Tibshirani, Regularization path for generalized linear models via coordinate descent, Journal of Statistical Software, vol.33, pp.1-122, 2010.

T. Hastie, R. Tibshirani, and M. Wainwright, Statistcal learning with sparsity: The lasso and generalizations, 2015.

S. Han, J. Pool, J. Tran, and W. Dally, Learning both weights and connections for efficient neural network, Advances in neural information processing systems, pp.1135-1143, 2015.

E. Tartaglione, S. Lepsøy, A. Fiandrotti, and G. Francini, Learning sparse neural networks via sensitivity-driven regularization, Advances in Neural Information Processing Systems, pp.3878-3888, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01995794

A. N. Gomez, I. Zhang, K. Swersky, Y. Gal, and G. E. Hinton, Learning sparse networks using targeted dropout, 2019.

M. Carreira-perpiñán and Y. Idelbayev, Learning-compression algorithms for neural net pruning, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, Pruning filters for efficient convnets, 2016.

Y. He, X. Zhang, and J. Sun, Channel pruning for accelerating very deep neural networks, Proceedings of the IEEE International Conference on Computer Vision, pp.1389-1397, 2017.

H. Hu, R. Peng, Y. Tai, and C. Tang, Network trimming: A datadriven neuron pruning approach towards efficient deep architectures, 2016.

M. Yuan and Y. Lin, Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.68, issue.1, pp.49-67, 2006.

S. Scardapane, D. Comminiello, A. Hussain, and A. Uncini, Group sparse regularization for deep neural networks, Neurocomputing, vol.241, pp.81-89, 2017.

W. Wen, C. Wu, Y. Wang, Y. Chen, and H. Li, Learning structured sparsity in deep neural networks, Advances in neural information processing systems, pp.2074-2082, 2016.

J. Friedman, T. Hastie, and R. Tibshirani, A note on the group lasso and a sparse group lasso, 2010.

A. Torfi, R. A. Shirvani, S. Soleymani, and N. M. Nasrabadi, Attentionbased guided structured sparsity of deep neural networks, 2018.

Z. Liu, J. Li, Z. Shen, G. Huang, S. Yan et al., Learning efficient convolutional networks through network slimming, Proceedings of the IEEE International Conference on Computer Vision, pp.2736-2744, 2017.

Y. Cheng, D. Wang, P. Zhou, and T. Zhang, A survey of model compression and acceleration for deep neural networks, 2017.

D. Zhang, Y. Hu, J. Ye, X. Li, and X. He, Matrix completion by truncated nuclear norm regularization, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.

J. Cavazza, P. Morerio, B. Haeffele, C. Lane, V. Murino et al., Dropout as a low-rank regularizer for matrix factorization, International Conference on Artificial Intelligence and Statistics, pp.435-444, 2018.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting, The journal of machine learning research, vol.15, issue.1, pp.1929-1958, 2014.

J. Moreau, Proximité et dualité dans un espace hilbertien, Bull. Soc.Math. France, vol.93, pp.273-299, 1965.

P. Lions and B. Mercier, Splitting algorithms for the sum of two nonlinear operators, SIAM Journal on Numerical Analysis, vol.16, issue.6, pp.964-979, 1979.

P. L. Combettes and J. Pesquet, Proximal splitting methods in signal processing," in Fixed-point algorithms for inverse problems in science and engineering, pp.185-212, 2011.

S. Mosci, L. Rosasco, M. Santoro, A. Verri, and S. Villa, Solving structured sparsity regularization with proximal methods, Machine Learning and Knowledge Discovery in Databases, pp.418-433, 2010.

S. Sra, S. Nowozin, and S. J. Wright, Optimization for Machine Learning, 2012.

T. Hastie, S. Rosset, R. Tibshirani, and J. Zhu, The entire regularization path for the support vector machine, Journal of Machine Learning Research, vol.5, pp.1391-1415, 2004.

J. Mairal and B. Yu, Complexity analysis of the lasso regularization path, Proceedings of the 29th International Conference on Machine Learning, pp.353-360, 2012.

H. Zhou, J. M. Alvarez, and F. Porikli, Less is more: Towards compact cnns, European Conference on Computer Vision, pp.662-677, 2016.

J. M. Alvarez and M. Salzmann, Learning the number of neurons in deep networks, Advances in Neural Information Processing Systems, pp.2270-2278, 2016.

Z. Huang and N. Wang, Data-driven sparse structure selection for deep neural networks, Proceedings of the European Conference on Computer Vision (ECCV), pp.304-320, 2018.

J. Yoon and S. J. Hwang, Combined group and exclusive sparsity for deep neural networks, Proceedings of the 34th International Conference on Machine Learning, vol.70, pp.3958-3966, 2017.

D. Zhang, H. Wang, M. Figueiredo, and L. Balzano, Learning to share: Simultaneous parameter tying and sparsification in deep learning, 2018.

U. Oswal, C. Cox, M. Lambon-ralph, T. Rogers, and R. Nowak, Representational similarity learning with application to brain networks, International Conference on Machine Learning, pp.1041-1049, 2016.

D. Zhang, J. Katz-samuels, M. A. Figueiredo, and L. Balzano, Simultaneous sparsity and parameter tying for deep learning using ordered weighted 1 regularization, 2018 IEEE Statistical Signal Processing Workshop (SSP), pp.65-69, 2018.

M. Figueiredo and R. Nowak, Ordered weighted l1 regularized regression with strongly correlated covariates: Theoretical aspects, Artificial Intelligence and Statistics, pp.930-938, 2016.

S. Lin, R. Ji, Y. Li, C. Deng, and X. Li, Toward compact convnets via structure-sparsity regularized filter pruning, IEEE transactions on neural networks and learning systems, 2019.

L. Condat, Fast projection onto the simplex and the l1 ball, Mathematical Programming Series A, vol.158, issue.1, pp.575-585, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01056171

J. Duchi, S. Shalev-shwartz, Y. Singer, and T. Chandra, Efficient projections onto the l 1-ball for learning in high dimensions, Proceedings of the 25th international conference on Machine learning, pp.272-279, 2008.

G. Perez, M. Barlaud, L. Fillatre, and J. Régin, A filtered bucketclustering method for projection onto the simplex and the 1 -ball, Mathematical Programming, 2019.
URL : https://hal.archives-ouvertes.fr/hal-01562642

J. Liu, S. Ji, and J. Ye, Multi-task feature learning via efficient l2, 1-norm minimization, Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, ser. UAI '09, pp.339-348, 2009.

M. Barlaud, A. Chambolle, and J. Caillau, Robust supervised classification and feature selection using a primal-dual method, 2019.
URL : https://hal.archives-ouvertes.fr/hal-01992399

M. Barlaud, W. Belhajali, P. L. Combettes, and L. Fillatre, Classification and regression using an outer approximation projection-gradient method, vol.65, pp.4635-4643, 2017.

M. Yuan and Y. Lin, Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.68, issue.1, pp.49-67, 2006.

J. Frankle and M. Carbin, The lottery ticket hypothesis: Finding sparse, trainable neural networks, International Conference on Learning Representations, 2019.

H. Zhou, J. Lan, R. Liu, and J. Yosinski, Deconstructing lottery tickets: Zeros, signs, and the supermask, Advances in Neural Information Processing Systems, vol.32, pp.3597-3607, 2019.

D. Kingma and J. Ba, a method for stochastic optimization, ternational Conference on Learning Representations, pp.1-13, 2015.

Y. Lecun, The mnist database of handwritten digits

H. Xiao, K. Rasul, and R. Vollgraf, Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms, 2017.

S. H. Hasanpour, M. Rouhani, M. Fayyaz, and M. Sabokrou, Lets keep it simple, using simple architectures to outperform deeper and more complex architectures, 2016.

E. Grochowski and M. Annavaram, Energy per instruction trends in intel ® microprocessors, 2006.