K. Fukushima, Neocognitron: a self-organizing neural network for a mechanism of pattern recognition unaffected by shift in position, Biol. Cybern, vol.36, issue.4, pp.193-202, 1980.

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proc. IEEE, vol.86, issue.11, pp.2278-2324, 1998.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, Proceedings of the Advances in Neural Information Processing Systems, pp.1097-1105, 2012.

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh et al., ImageNet Large scale visual recognition challenge, Int. J. Comput. Vis. (IJCV), vol.115, issue.3, pp.211-252, 2015.

T. Lin, M. Maire, S. Belongie, L. Bourdev, R. Girshick et al., , 2015.

L. Bottou, Large-scale machine learning with stochastic gradient descent, Proceedings of the COMPSTAT'2010, pp.177-186, 2010.

K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.770-778, 2016.

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, 2014.

S. Zhang, A. E. Choromanska, and Y. Lecun, Proceedings of the Advances in Neural Information Processing Systems 28, pp.685-693, 2015.

J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin et al., Large scale distributed deep networks, Proceedings of the Advances in Neural Information Processing Systems 25, pp.1223-1231, 2012.

S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, Randomized gossip algorithms, IEEE Trans. Inf. Theory, vol.52, 2006.

I. Colin, A. Bellet, J. Salmon, and S. Clémençon, Gossip dual averaging for decentralized optimization of pairwise functions, Proceedings of the Thirty-Third International Conference on Machine Learning, Proceedings of Machine Learning Research, vol.48, pp.1388-1396, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01329315

J. Fellus, D. Picard, and P. Gosselin, Asynchronous gossip principal components analysis, Neurocomputing, vol.169, pp.262-271, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01148639

G. Di-fatta, F. Blasa, S. Cafiero, and G. Fortino, Epidemic k-means clustering, Proceedings of the IEEE Eleventh International Conference on Data Mining Workshops (ICDMW), pp.151-158, 2011.

J. Fellus, D. Picard, and P. Gosselin, Decentralized k-means using randomized gossip protocols for clustering large datasets, Proceedings of the IEEE Thirteenth International Conference on Data Mining Workshops, pp.599-606, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00915822

V. Vapnik, The Nature of Statistical Learning Theory, 1995.

D. G. Luenberger and Y. Ye, Linear and Nonlinear Programming, 2008.

D. P. Kingma and J. Ba, Adam: a method for stochastic optimization, Proceedings of the International Conference on Learning Representations, 2015.

G. Hinton, Overview of mini-batch gradient descent, Neural Networks for Machine Learning, vol.6, 2013.

G. Huang, Z. Liu, L. Van-der-maaten, and K. Q. Weinberger, Densely connected convolutional networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.4700-4708, 2017.

H. Ma, F. Mao, and G. W. Taylor, Theano-MPI: a theano-based distributed training framework, Proceedings of the European Conference on Parallel Processing, pp.800-813, 2016.

A. Choromanska, M. Henaff, M. Mathieu, G. B. Arous, and Y. Lecun, The loss surfaces of multilayer networks, Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, vol.38, pp.192-204, 2015.

D. Kempe, A. Dobra, and J. Gehrke, Gossip-based computation of aggregate information, Proceedings of the Forty-Fourth Annual IEEE Symposium on Foundations of Computer Science, pp.4-82, 2003.

S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers, Found. Trends Mach. Learn, vol.3, issue.1, pp.1-122, 2011.

A. Krizhevsky, Learning Multiple Layers of Features From Tiny Images, 2009.

L. Wan, M. Zeiler, S. Zhang, Y. Le-cun, and R. Fergus, Proceedings of the International Conference on Machine Learning, pp.1058-1066, 2013.

R. Collobert, C. Farabet, K. Kavukcuoglu, and S. Chintala, Torch, vol.7