M. Jose, M. Alvarez, and . Salzmann, Learning the number of neurons in deep networks, Advances in Neural Information Processing Systems, pp.2270-2278, 2016.

X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp.249-256, 2010.

A. Graves, Practical variational inference for neural networks, Advances in neural information processing systems, pp.2348-2356, 2011.

S. Han, H. Mao, and W. J. Dally, Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding, 2015.

E. Geoffrey, D. Hinton, and . Van-camp, Keeping neural networks simple, International Conference on Artificial Neural Networks, pp.11-18, 1993.

L. Hörmander, The Analysis of Linear Partial Differential Operators I, 1998.

Z. Michael-i-jordan, T. S. Ghahramani, L. Jaakkola, and . Saul, An introduction to variational methods for graphical models, Machine learning, vol.37, issue.2, pp.183-233, 1999.

P. Durk, T. Kingma, M. Salimans, and . Welling, Variational dropout and the local reparameterization trick, Advances in Neural Information Processing Systems, pp.2575-2583, 2015.

C. Li, C. Chen, D. E. Carlson, and L. Carin, Preconditioned stochastic gradient Langevin dynamics for deep neural networks, Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp.1788-1794, 2016.

H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, , 2016.

Z. Liu, J. Li, Z. Shen, G. Huang, S. Yan et al., Learning efficient convolutional networks through network slimming, 2017 IEEE International Conference on, pp.2755-2763, 2017.

C. Louizos, K. Ullrich, and M. Welling, Bayesian compression for deep learning, Advances in Neural Information Processing Systems, pp.3288-3298, 2017.

J. C. David and . Mackay, Bayesian model comparison and backprop nets, Advances in neural information processing systems, pp.839-846, 1992.

J. C. David and . Mackay, A practical bayesian framework for backpropagation networks, Neural computation, vol.4, issue.3, pp.448-472, 1992.

J. C. David and . Mackay, Probable networks and plausible predictionsa review of practical bayesian methods for supervised neural networks. Network: computation in neural systems, vol.6, pp.469-505, 1995.

J. C. David and . Mackay, Information theory, inference and learning algorithms, 2003.

G. Marceau-caron and Y. Ollivier, Natural langevin dynamics for neural networks, International Conference on Geometric Science of Information, pp.451-459, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01655949

D. Molchanov, A. Ashukha, and D. Vetrov, Variational dropout sparsifies deep neural networks, Proceedings of the 34th International Conference on Machine Learning, vol.70, pp.2498-2507, 2017.

A. Bruno, D. Olshausen, and . Field, Sparse coding with an overcomplete basis set: A strategy employed by v1? Vision research, vol.37, pp.3311-3325, 1997.

D. Andrei, A. Polyanin, and . Manzhirov, Handbook of integral equations, 1998.

S. Scardapane, D. Comminiello, A. Hussain, and A. Uncini, Group sparse regularization for deep neural networks, Neurocomputing, vol.241, pp.81-89, 2017.

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, 2014.

M. Elias, G. Stein, and . Weiss, Introduction to Fourier analysis on Euclidean spaces, vol.32, 2016.

R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society: Series B (Methodological), vol.58, issue.1, pp.267-288, 1996.

M. Yuan and Y. Lin, Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.68, issue.1, pp.49-67, 2006.

C. Zhang, S. Bengio, and Y. Singer, Are all layers created equal?, 2019.