Deep learning, Nature, vol.9, issue.7553, pp.436-444, 2015. ,
DOI : 10.1007/s10994-013-5335-x
Comparison for a 784-80-70-60-50-40-30-20-10 network ,
Learning long-term dependencies with gradient descent is difficult, IEEE Transactions on Neural Networks, vol.5, issue.2, 1994. ,
DOI : 10.1109/72.279181
URL : http://www.research.microsoft.com/~patrice/PDF/long_term.pdf
Optimization methods for large-scale machine learning, Tech. Rep, 2016. ,
On the momentum term in gradient descent learning algorithms, Neural Networks, vol.12, issue.1, pp.145-151, 1999. ,
DOI : 10.1016/S0893-6080(98)00116-6
A method of solving a convex programming problem with convergence rate O(1/k2), Soviet Mathematics Doklady, pp.372-376, 1983. ,
Adaptive subgradient methods for online learning and stochastic optimization, J. Mach. Learn. Res, vol.12, pp.2121-2159, 2011. ,
ADADELTA: An adaptive learning rate method Available online at https, 2012. ,
Lecture 6.5 ? RMSProp: Divide the gradient by a running average of its recent magnitude, COURSERA: Neural Networks for Machine Learning, 2012. ,
Adam: A method for stochastic optimization, Int. Conf. Learn. Representations, pp.14-16, 2014. ,
On the importance of initialization and momentum in deep learning, Int. Conf, pp.16-21, 2013. ,
Reducing the Dimensionality of Data with Neural Networks, Science, vol.313, issue.5786, pp.504-507, 2006. ,
DOI : 10.1126/science.1127647
Deep learning via Hessian-free optimization, " in Int, Conf. Mach. Learn, pp.21-24, 2010. ,
DOI : 10.1007/978-3-642-35289-8_27
URL : http://www.cs.toronto.edu/~jmartens/docs/HF_book_chapter.pdf
Learning recurrent neural networks with hessian-free optimization, Proc. Int'l Conf. Machine Learning, 2011. ,
DOI : 10.1007/978-3-642-35289-8_27
URL : http://www.cs.toronto.edu/~jmartens/docs/HF_book_chapter.pdf
Krylov subspace descent for deep learning, Int. Conf. Artif. Intell. Statist, pp.21-23, 2012. ,
Identifying and attacking the saddle point problem in highdimensional non-convex optimization, Ann. Conf. Neur. Inform. Proc. Syst, pp.8-11, 2014. ,
Computing a Trust Region Step, SIAM Journal on Scientific and Statistical Computing, vol.4, issue.3, pp.553-572, 1983. ,
DOI : 10.1137/0904038
Recent advances in trust region algorithms, Mathematical Programming, vol.146, issue.6, pp.249-281, 2015. ,
DOI : 10.1007/s10107-013-0679-3
A Stochastic Majorize-Minimize Subspace Algorithm for Online Penalized Least Squares Estimation, IEEE Transactions on Signal Processing, vol.65, issue.18, 2017. ,
DOI : 10.1109/TSP.2017.2709265
URL : https://hal.archives-ouvertes.fr/hal-01613204
A Subspace Minimization Method for the Trust-Region Step, SIAM Journal on Optimization, vol.20, issue.3, pp.1439-1461, 2010. ,
DOI : 10.1137/08072440X
Fast Exact Multiplication by the Hessian, Neural Computation, vol.6, issue.1, pp.147-160, 1994. ,
DOI : 10.1109/PROC.1976.10286
Cubic regularization of Newton method and its global performance, Mathematical Programming, vol.99, issue.1, pp.177-205, 2006. ,
DOI : 10.1007/s10107-006-0706-8