D. P. Bertsekas and J. N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods, 1989.

R. Collobert, S. Bengio, and Y. Bengio, A Parallel Mixture of SVMs for Very Large Scale Problems, Neural Computation, vol.20, issue.5
DOI : 10.1162/neco.1991.3.1.79

C. De-sa, C. Zhang, K. Olukotun, and C. Ré, Taming the wild: a unified analysis of Hogwild!-style algorithms, NIPS, 2015.

A. Defazio, F. Bach, and S. Lacoste-julien, SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives, NIPS, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01016843

J. C. Duchi, S. Chaturapruek, and C. Ré, Asynchronous stochastic convex optimization, NIPS, 2015.

T. Hofmann, A. Lucchi, S. Lacoste-julien, and B. Mcwilliams, Variance reduced stochastic gradient descent with neighbors, NIPS, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01248672

C. Hsieh, H. Yu, and I. Dhillon, PASSCoDe: Parallel ASynchronous Stochastic dual Co-ordinate Descent, ICML, 2015.

R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, NIPS, 2013.

L. Roux, M. Schmidt, and F. Bach, A stochastic gradient method with an exponential convergence rate for finite training sets, NIPS, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00674995

D. D. Lewis, Y. Yang, T. G. Rose, and F. Li, RCV1: A new benchmark collection for text categorization research, JMLR, 2004.

X. Lian, Y. Huang, Y. Li, and J. Liu, Asynchronous parallel stochastic gradient for nonconvex optimization, NIPS, 2015.

J. Liu, S. J. Wright, C. Ré, V. Bittorf, and S. Sridhar, An asynchronous parallel stochastic coordinate descent algorithm, Journal of Machine Learning Research, vol.16, pp.285-322, 2015.

J. Ma, L. K. Saul, S. Savage, and G. M. Voelker, Identifying suspicious URLs, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009.
DOI : 10.1145/1553374.1553462

H. Mania, X. Pan, D. Papailiopoulos, B. Recht, K. Ramchandran et al., Perturbed iterate analysis for asynchronous stochastic optimization, 2015.

F. Niu, B. Recht, C. Re, and S. Wright, Hogwild: a lock-free approach to parallelizing stochastic gradient descent, NIPS, 2011.

S. J. Reddi, A. Hefny, S. Sra, B. Póczos, and A. Smola, On variance reduction in stochastic gradient descent and its asynchronous variants, NIPS, 2015.

M. Schmidt, N. L. Roux, and F. Bach, Minimizing finite sums with the stochastic average gradient, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00860051

S. Shalev-shwartz and T. Zhang, Stochastic dual coordinate ascent methods for regularized loss, JMLR, vol.14, issue.1, pp.567-599, 2013.

S. Zhao and W. Li, Fast asynchronous parallel stochastic gradient descent, AAAI, 2016.

. Covertype, On our third dataset, the associated task is a binary classification problem (down from 7 classes originally, following the pre-treatment of Collobert et al. [2]). The features are cartographic variables

. Realsim, We only use our fourth dataset for non-parallel experiments and a specific compare-and-swap test. It constitutes of UseNet articles taken from four discussion groups (simulated auto racing, simulated aviation, real autos, real aviation)

. Hardware, All experiments were run on a Dell PowerEdge 920 machine with 4 Intel Xeon E7-4830v2 processors with 10 2

. Software, All algorithms were implemented in the Scala language and the software stack consisted of a Linux operating system running Scala 2