The tradeoffs of large scale learning, Advances in Neural Information Processing Systems 21, 2008. ,
Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning, Advances in Neural Information Processing Systems 24, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00608041
Ad click prediction, Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '13, 2013. ,
DOI : 10.1145/2487575.2488200
Multi-core processors -an overview, 2011. ,
Parallelized stochastic gradient descent, Advances in Neural Information Processing Systems 23, 2010. ,
Slow learners are fast, Advances in Neural Information Processing Systems 22, 2009. ,
HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent, Advances in Neural Information Processing Systems 24, 2011. ,
Passcode: Parallel asynchronous stochastic dual co-ordinate descent, International Conference on Machine Learning, 2015. ,
Perturbed Iterate Analysis for Asynchronous Stochastic Optimization, SIAM Journal on Optimization, vol.27, issue.4, 2015. ,
DOI : 10.1137/16M1057000
Optimal distributed online prediction using mini-batches, Journal of Machine Learning Research, 2012. ,
Parallelizing Stochastic Approximation Through Mini-Batching and Tail-Averaging, 2016. ,
Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research, 2011. ,
Topmoumoute online natural gradient algorithm, Advances in Neural Information Processing Systems, 2008. ,
Efficient mini-batch training for stochastic optimization, Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '14, 2014. ,
DOI : 10.1145/2623330.2623612
ASAGA: Asynchronous Parallel SAGA, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01407833
Estimation, optimization, and parallelism when data is sparse, Advances in Neural Information Processing Systems 26, 2013. ,
Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n), Advances in Neural Information Processing Systems 26, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00831977
On the importance of initialization and momentum in deep learning, International Conference on Machine Learning, 2013. ,
Adding vs. averaging in distributed primal-dual optimization, International Conference on Machine Learning, 2015. ,
Batched Stochastic Gradient Descent with Weighted Sampling, 2016. ,
DOI : 10.1007/s11075-007-9136-9
Confidence level solutions for stochastic programming, Automatica, vol.44, issue.6, 2008. ,
DOI : 10.1016/j.automatica.2008.01.017
URL : http://ecolu-info.unige.ch/~logilab/reports/GradStoc.ps
Averaged least-mean-squares: Bias-variance trade-offs and optimal sampling distributions, Artificial Intelligence and Statistics, 2015. ,
Accelerating stochastic gradient descent using predictive variance reduction, Advances in Neural Information Processing Systems 26, 2013. ,
SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives, Advances in Neural Information Processing Systems 27, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01016843
Identifying suspicious URLs, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009. ,
DOI : 10.1145/1553374.1553462
Convex Optimization: Algorithms and Complexity, Machine Learning, 2015. ,
DOI : 10.1561/2200000050