. Zeyuan-allen-zhu and . Katyusha, The first direct acceleration of stochastic gradient methods, Proceedings of Symposium on Theory of Computing, pp.1200-1205, 2017.

F. Baccelli, G. Cohen, G. J. Olsder, and J. Quadrat, Synchronization and Linearity: an Algebra for Discrete Event Systems, 1992.

L. Bottou, Large-scale machine learning with stochastic gradient descent, Proceedings of COMPSTAT, pp.177-186, 2010.

S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, Randomized gossip algorithms, IEEE Transactions on Information Theory, vol.52, issue.6, pp.2508-2530, 2006.

S. Bubeck, Convex optimization: Algorithms and complexity. Foundations and Trends R in Machine Learning, vol.8, pp.231-357, 2015.

J. Chen, X. Pan, R. Monga, S. Bengio, and R. Jozefowicz, , 2016.

I. Colin, A. Bellet, J. Salmon, and S. Clémençon, Gossip dual averaging for decentralized optimization of pairwise functions, Proceedings of the International Conference on International Conference on Machine Learning, vol.48, pp.1388-1396, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01329315

A. Defazio, A simple practical accelerated method for finite sums, Advances in Neural Information Processing Systems, pp.676-684, 2016.

A. Defazio, F. Bach, and S. Lacoste-julien, SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives, Advances in Neural Information Processing Systems, pp.1646-1654, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01016843

A. John-c-duchi, M. Agarwal, and . Wainwright, Dual averaging for distributed optimization: Convergence analysis and network scaling, IEEE Transactions on Automatic Control, vol.57, issue.3, pp.592-606, 2012.

O. Fercoq and P. Richtárik, Accelerated, parallel, and proximal coordinate descent, SIAM Journal on Optimization, vol.25, issue.4, pp.1997-2023, 2015.

L. He, A. Bian, and M. Jaggi, Cola: Decentralized linear learning, Advances in Neural Information Processing Systems, pp.4536-4546, 2018.