A. Agarwal, S. Negahban, and M. J. Wainwright, Fast global convergence rates of gradient methods for high-dimensional statistical recovery, Advances in Neural Information Processing Systems, pp.37-45, 2010.

A. Agarwal, S. Negahban, and M. J. Wainwright, Fast global convergence of gradient methods for high-dimensional statistical recovery, The Annals of Statistics, vol.40, issue.5, pp.2452-2482, 2012.
DOI : 10.1214/12-AOS1032SUPP

Z. Allen-zhu, Katyusha: The first direct acceleration of stochastic gradient methods', arXiv preprint, 2016.

A. Defazio, A simple practical accelerated method for finite sums, Advances in Neural Information Processing Systems, vol.29, pp.676-684, 2016.

A. Defazio, F. Bach, and S. Lacoste-julien, 6154-a-simple-practical-accelerated-method-for-finite-sums: A fast incremental gradient method with support for non-strongly convex composite objectives, 'Advances in Neural Information Processing Systems, pp.1646-1654, 2014.

O. Fercoq and Z. Qu, Restarting accelerated gradient methods with a rough strong convexity estimate', arXiv preprint, 2016.

O. Fercoq and Z. Qu, Adaptive restart of accelerated gradient methods under local quadratic growth condition', arXiv preprint, 2017.

R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, Advances in Neural Information Processing Systems, pp.315-323, 2013.

H. Karimi, J. Nutini, and M. Schmidt, Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-??ojasiewicz Condition, Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp.795-811, 2016.
DOI : 10.1137/120869997

G. Lan and Y. Zhou, An optimal randomized incremental gradient method', arXiv preprint, 2015.
DOI : 10.1007/s10107-017-1173-0

J. Langford, L. Li, and T. Zhang, Sparse online learning via truncated gradient, Journal of Machine Learning Research, vol.10, pp.777-801, 2009.

M. Lichman, UCI machine learning repository, 2013.

H. Lin, J. Mairal, and Z. Harchaoui, A universal catalyst for first-order optimization, 'Advances in Neural Information Processing Systems, pp.3384-3392, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01160728

Q. Lin, Z. Lu, and L. Xiao, An accelerated proximal coordinate gradient method, Advances in Neural Information Processing Systems, pp.3059-3067, 2014.

Y. Nesterov, A method of solving a convex programming problem with convergence rate o (1/k2), in 'Soviet Mathematics Doklady, pp.372-376, 1983.

Y. Nesterov, Gradient methods for minimizing composite objective function, 2007.
DOI : 10.1007/s10107-012-0629-5

A. Nitanda, Stochastic proximal gradient descent with acceleration techniques, Advances in Neural Information Processing Systems, pp.1574-1582, 2014.

S. Oymak, B. Recht, and M. Soltanolkotabi, Sharp time?data tradeoffs for linear inverse problems', arXiv preprint, 2015.
DOI : 10.1109/tit.2017.2773497

M. Pilanci and M. J. Wainwright, Randomized sketches of convex programs with sharp guarantees' , Information Theory, IEEE Transactions on, vol.61, issue.9, pp.5096-5115, 2015.

M. Pilanci and M. J. Wainwright, Iterative hessian sketch: Fast and accurate solution approximation for constrained least-squares', Journal of Machine Learning Research, vol.17, issue.53, pp.1-38, 2016.

C. Qu, Y. Li, and H. Xu, Saga and restricted strong convexity', arXiv preprint, 2017.

C. Qu and H. Xu, Linear convergence of svrg in statistical estimation', arXiv preprint, 2016.

C. Qu and H. Xu, Linear convergence of sdca in statistical estimation', arXiv preprint, 2017.

G. Raskutti, M. J. Wainwright, and B. Yu, Restricted eigenvalue properties for correlated gaussian designs, Journal of Machine Learning Research, vol.11, pp.2241-2259, 2010.

N. L. Roux, M. Schmidt, and F. R. Bach, A stochastic gradient method with an exponential convergence rate for finite training sets, Advances in Neural Information Processing Systems 25, pp.2663-2671, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00674995

S. Shalev-shwartz and T. Zhang, Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization, International Conference on Machine Learning, pp.64-72, 2014.
DOI : 10.1023/A:1012498226479
URL : http://arxiv.org/pdf/1309.2375

O. Shamir, T. C. Zhang, and C. , Stochastic gradient descent for non-smooth optimization: Convergence results and optimal averaging schemes, pp.71-79, 2008.

L. Xiao and T. Zhang, A Proximal Stochastic Gradient Method with Progressive Variance Reduction, SIAM Journal on Optimization, vol.24, issue.4, pp.2057-2075, 2014.
DOI : 10.1137/140961791
URL : http://arxiv.org/pdf/1403.4699

Y. Zhang and X. Lin, Stochastic primal-dual coordinate method for regularized empirical risk minimization, Proceedings of the 32nd International Conference on Machine Learning (ICML-15, pp.353-361, 2015.

U. Of, INRIA E-mail address: francis.bach@inria.fr UNIVERSITY OF EDINBURGH E-mail address: m.golbabaee@ed.ac.uk UNIVERSITY OF EDINBURGH E-mail address: Mike, J.Tang@ed.ac.uk SIERRA PROJECT TEAM