O. Bousquet and L. Bottou, The tradeoffs of large scale learning, Advances in Neural Information Processing Systems 21, 2008.

F. Bach and E. Moulines, Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning, Advances in Neural Information Processing Systems 24, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00608041

H. B. Mcmahan, G. Holt, D. Sculley, M. Young, D. Ebner et al., Ad click prediction, Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '13, 2013.
DOI : 10.1145/2487575.2488200

B. Venu, Multi-core processors -an overview, 2011.

M. Zinkevich, M. Weimer, L. Li, and A. J. Smola, Parallelized stochastic gradient descent, Advances in Neural Information Processing Systems 23, 2010.

M. Zinkevich, J. Langford, and A. J. Smola, Slow learners are fast, Advances in Neural Information Processing Systems 22, 2009.

F. Niu, B. Recht, C. Re, and S. J. Wright, HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent, Advances in Neural Information Processing Systems 24, 2011.

C. Hsieh, H. Yu, and I. Dhillon, Passcode: Parallel asynchronous stochastic dual co-ordinate descent, International Conference on Machine Learning, 2015.

H. Mania, X. Pan, D. Papailiopoulos, B. Recht, K. Ramchandran et al., Perturbed Iterate Analysis for Asynchronous Stochastic Optimization, SIAM Journal on Optimization, vol.27, issue.4, 2015.
DOI : 10.1137/16M1057000

O. Dekel, R. Gilad-bachrach, O. Shamir, and L. Xiao, Optimal distributed online prediction using mini-batches, Journal of Machine Learning Research, 2012.

P. Jain, S. M. Kakade, R. Kidambi, P. Netrapalli, and A. Sidford, Parallelizing Stochastic Approximation Through Mini-Batching and Tail-Averaging, 2016.

J. Duchi, E. Hazan, and Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research, 2011.

N. Le-roux, P. Manzagol, and Y. Bengio, Topmoumoute online natural gradient algorithm, Advances in Neural Information Processing Systems, 2008.

M. Li, T. Zhang, Y. Chen, and A. J. Smola, Efficient mini-batch training for stochastic optimization, Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '14, 2014.
DOI : 10.1145/2623330.2623612

R. Leblond, F. Pedregosa, and S. Lacoste-julien, ASAGA: Asynchronous Parallel SAGA, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01407833

J. Duchi, I. Michael, B. Jordan, and . Mcmahan, Estimation, optimization, and parallelism when data is sparse, Advances in Neural Information Processing Systems 26, 2013.

F. Bach and E. Moulines, Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n), Advances in Neural Information Processing Systems 26, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00831977

I. Sutskever, J. Martens, G. Dahl, and G. Hinton, On the importance of initialization and momentum in deep learning, International Conference on Machine Learning, 2013.

C. Ma, V. Smith, M. Jaggi, M. I. Jordan, P. Richtárik et al., Adding vs. averaging in distributed primal-dual optimization, International Conference on Machine Learning, 2015.

D. Needell and R. Ward, Batched Stochastic Gradient Descent with Weighted Sampling, 2016.
DOI : 10.1007/s11075-007-9136-9

Y. Nesterov and J. Vial, Confidence level solutions for stochastic programming, Automatica, vol.44, issue.6, 2008.
DOI : 10.1016/j.automatica.2008.01.017
URL : http://ecolu-info.unige.ch/~logilab/reports/GradStoc.ps

A. Défossez and F. Bach, Averaged least-mean-squares: Bias-variance trade-offs and optimal sampling distributions, Artificial Intelligence and Statistics, 2015.

R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, Advances in Neural Information Processing Systems 26, 2013.

A. Defazio, F. Bach, and S. Lacoste-julien, SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives, Advances in Neural Information Processing Systems 27, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01016843

J. Ma, L. K. Saul, S. Savage, and G. M. Voelker, Identifying suspicious URLs, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009.
DOI : 10.1145/1553374.1553462

S. Bubeck, Convex Optimization: Algorithms and Complexity, Machine Learning, 2015.
DOI : 10.1561/2200000050