Speed learning on the fly ,
Automatic step-size adaptation in incremental supervised learning ,
Analysis of adaptive step-size SA algorithms for parameter tracking, IEEE Transactions on Automatic Control, vol.40, pp.1403-1410, 1995. ,
Large-scale machine learning with stochastic gradient descent ,
DOI : 10.1201/b11429-4
A Stochastic Approximation Method ,
Stochastic Approximation Methods for Constrained and Unconstrained Systems ,
DOI : 10.1007/978-1-4684-9352-8
SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives Accelerating Stochastic Gradient Descent using Predictive Variance Reduction, Advances in Neural Information Processing Systems, 2014. ,
Minimizing finite sums with the Stochastic Average Gradient, 2013. ,
SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives, Minimizing finite sums with the Stochastic Average Gradient, op. cit ,
URL : https://hal.archives-ouvertes.fr/hal-01016843
Natural Gradient Works Efficiently in Learning, Neural Computation, vol.37, issue.2 ,
DOI : 10.1103/PhysRevLett.76.2188
Adaptive subgradient methods for online learning and stochastic optimization, The Journal of Machine Learning Research, vol.12, pp.2121-2159, 2011. ,
No More Pesky Learning Rates ,
Gradient-based Hypermarameter Optimization through Reversible Learning, Proceedings of The 32nd International Conference on Machine Learning ,
Optimal Filtering, 1979. ,
Natural Gradient Works Efficiently in Learning, Neural Computation, vol.37, issue.2, 1998. ,
DOI : 10.1103/PhysRevLett.76.2188
Spectral Learning from a Single Trajectory under Finite-State Policies, Proceedings of the 34th International Conference on Machine Learning Proceedings of Machine Learning Research. International Convention Centre, pp.361-370, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01590940
Large-scale machine learning with stochastic gradient descent, Proceedings of COMPSTAT'2010, pp.177-186, 2010. ,
DOI : 10.1201/b11429-4
On-line Learning and Stochastic Approximations, pp.9-42, 1999. ,
DOI : 10.1017/CBO9780511569920.003
URL : http://leon.bottou.org/publications/pdf/online-1998.pdf
Learning long-term dependencies with gradient descent is difficult, IEEE Transactions on Neural Networks, vol.5, issue.2, pp.157-166, 1994. ,
DOI : 10.1109/72.279181
URL : http://www.research.microsoft.com/~patrice/PDF/long_term.pdf
Inference in Hidden Markov Models, 2005. ,
Identifying and attacking the saddle point problem in high-dimensional non-convex optimization, Advances in Neural Information Processing Systems, pp.2933-2941, 2014. ,
SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives, Advances in Neural Information Processing Systems, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01016843
General results on the convergence of stochastic algorithms, IEEE Transactions on Automatic Control, vol.41, issue.9, pp.1245-1255, 1996. ,
DOI : 10.1109/9.536495
Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01275431
Adaptive subgradient methods for online learning and stochastic optimization, The Journal of Machine Learning Research, vol.12, pp.2121-2159, 2011. ,
Convergence of a stochastic approximation version of the EM algorithm, The Annals of Statistics, vol.27, pp.94-128, 1999. ,
Convergence of Stochastic Algorithms : From the Kushner-Clark Theorem to the Lyapunov Functional Method, In : Advances in Applied Probability, vol.4, pp.1072-1094, 1996. ,
Multilayer feedforward networks are universal approximators, Neural Networks, vol.2, issue.5, pp.359-366, 1989. ,
DOI : 10.1016/0893-6080(89)90020-8
A tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the " echo state network " approach. Rapp. tech. 159. German National Research Center for Information Technology, 2002. ,
Accelerating Stochastic Gradient Descent using Predictive Variance Reduction, Advances in Neural Information Processing Systems 26, 2013. ,
Stochastic Approximation Methods for Constrained and Unconstrained Systems, T. 26. Applied Mathematical Sciences, 1978. ,
Stochastic Approximation and Recursive Algorithms and Applications, 2003. ,
Analysis of adaptive step-size SA algorithms for parameter tracking, IEEE Transactions on Automatic Control, vol.40, pp.1403-1410, 1995. ,
Analysis of recursive stochastic algorithms, IEEE Transactions on Automatic Control, vol.22, issue.4, pp.551-575, 1977. ,
DOI : 10.1109/TAC.1977.1101561
Ergodicity and speed of convergence to equilibrium for diffusion processes " . Cours disponible sur la page web de l'auteur, à l'adresse https ,
Automatic step-size adaptation in incremental supervised learning " . Mém.de mast, 2010. ,
Group Invariant Scattering, Communications on Pure and Applied Mathematics, vol.37, issue.10, pp.1331-1398, 2012. ,
DOI : 10.1137/S0036141002404838
URL : http://arxiv.org/pdf/1101.2286
Gradientbased Hypermarameter Optimization through Reversible Learning, Proceedings of The 32nd International Conference on Machine Learning, 2015. ,
Speed learning on the fly, p.preprint, 2015. ,
Training recurrent networks online without backtracking, p.preprint, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01228954
Riemannian metrics for neural networks I: feedforward networks, Information and Inference, vol.12, issue.6, pp.108-153, 2015. ,
DOI : 10.1007/BF03037353
URL : https://hal.archives-ouvertes.fr/hal-00857982
Riemannian metrics for neural networks II: recurrent networks and learning symbolic data sequences, Information and Inference, vol.1, issue.8, pp.153-193, 2015. ,
DOI : 10.1007/978-3-642-35289-8_27
URL : https://hal.archives-ouvertes.fr/hal-00857980
Online natural gradient as a Kalman filter, p.preprint, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01660622
Training recurrent networks online without backtracking, p.preprint, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01228954
Gradient calculations for dynamic recurrent neural networks: a survey, IEEE Transactions on Neural Networks, vol.6, pp.1212-1228, 1995. ,
A Stochastic Approximation Method, The Annals of Mathematical Statistics, vol.223, pp.400-407, 1951. ,
Explorations on high dimensional landscapes Article accepté pour un atelier à ICLR 2015, disponible sur arxiv à l'adresse https ,
Minimizing finite sums with the Stochastic Average Gradient ,
URL : https://hal.archives-ouvertes.fr/hal-00860051
No More Pesky Learning Rates, Proceedings of The 30th International Conference on Machine Learning. Sous la dir. de Sanjoy Dasgupta et David McAllester . JMLR, pp.343-351, 2013. ,
Unbiased Online Recurrent Optimization, p.preprint, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01666483
Unbiasing Truncated Backpropagation Through Time, p.preprint, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01660627
L'apprentissage profond : une révolution en intelligence artificielle " . Leçon inaugurale au Collège de France, disponible à l'adresse https ,
Oh the humanity! Poker computer trounces humans in big step for AI, 2017. ,
73 5.1.1 Pertes sur le couple état-paramètre, p.73 ,
Changement d'échelle de temps : construction des intervalles pour la convergence, p.90 ,
97 6.6.1 Condition d'optimalité sur la somme des gradients 97 6.6.2 Cas de la régression linéaire avec bruit gaussien, p.98 ,
120 9.4.1 Définition des trajectoires intermédiaires ,
126 9.5.1 Trajectoire intermédiaire issue des quantités initiales stables . 126 9.5.2 Horizon de contrôle de la plus grande valeur propre le long de la trajectoire stable, p.127 ,
137 10.2.1 Établissement de la condition homogène en la suite de pas de descente, p.137 ,
159 11.3.1 Modifications des hypothèses pour obtenir l'optimalité . . . . 159 11.3.2 Changement d'échelle de temps : construction des intervalles pour la convergence, p.160 ,
171 13.1.1 Opérateur de réduction sur les vecteurs spécifiques à « No, p.171 ,
180 14.4 Application de la propriété centrale à l'algorithme « NoBackTrack » 181 14.4.1 Application de la propriété centrale à la trajectoire « NoBack- Track », p.181 ,
184 15.2.1 Établissement des conditions non homogènes en la suite de pas de descente pour «, p.184 ,
209 17.3.1 Presentation of the experiments 209 17.3.2 Description and analysis of the results, p.215 ,
No More Pesky Learning Rates, 218 .1 LLR applied to the Stochastic Variance Reduced Gradient . . . . . . 219 .2 LLR applied to a general stochastic gradient algorithm ,