A. Berahas, J. Nocedal, and M. Taká?, A Multi-batch L-BFGS Method for Machine Learning, Proceedings of the 30th International Conference on Neural Information Processing Systems. Curran Associates Inc., USA, NIPS'16, pp.1063-1071, 2016.

J. Berg and K. Nyström, A unified deep artificial neural network approach to partial differential equations in complex geometries, Neurocomputing, vol.317, pp.28-41, 2018.

J. Bezanson, A. Edelman, S. Karpinski, and V. Shah, Julia: A fresh approach to numerical computing, SIAM Rev, vol.59, pp.65-98, 2017.

E. G. Birgin, J. L. Gardenghi, J. M. Martínez, S. A. Santos, and P. L. Toint, Worst-case evaluation complexity for unconstrained nonlinear optimization using high-order regularized models, Math. Program, vol.163, pp.359-368, 2017.

C. Bishop, Pattern recognition and machine learning, Information Science and Statistics, 2006.

A. Björck, Numerical Methods for Least Squares Problems, vol.51, 1996.

L. Bottou, F. Curtis, and J. Nocedal, Optimization methods for large-scale machine learning, SIAM Rev, vol.60, pp.223-311, 2018.

A. Brandt, General highly accurate algebraic coarsening, Electron. Trans. Numer. Anal, vol.10, pp.1-20, 2000.

W. Briggs, V. Henson, and S. Mccormick, A Multigrid Tutorial, 2000.

H. Calandra, S. Gratton, E. Riccietti, and X. Vasseur, On high-order multilevel optimization strategies, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02943218

H. Calandra, S. Gratton, E. Riccietti, and X. Vasseur, On the solution of systems of the form A T Ax = A T b + c, preprint, 2019.

C. Cartis, N. I. Gould, and P. L. Toint, Adaptive cubic regularisation methods for unconstrained optimization. part I: motivation, convergence and numerical results, Math. Program, vol.127, pp.245-295, 2011.

T. Clees, AMG strategies for PDE systems with applications in industrial semiconductor simulation, 2005.

P. Cocquet and M. Gander, How large a shift is needed in the shifted Helmholtz preconditioner for its effective inversion by multigrid?, SIAM. J. Sci. Comput, vol.39, pp.438-478, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01578444

G. , D. Muro, and S. Ferrari, A constrained-optimization approach to training neural networks for smooth function approximation and system identification, IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence, pp.2353-2359, 2008.

M. Dissanayake and N. Phan-thien, Neural-network-based approximations for solving partial differential equations, Commun. Numer. Methods. Eng, vol.10, pp.195-201, 1994.

W. E. and B. Yu, The deep Ritz method: A deep learning-based numerical algorithm for solving variational problems, 2017.

O. G. Ernst and M. J. Gander, Why it is Difficult to Solve Helmholtz Problems with Classical Iterative Methods, Numerical Analysis of Multiscale Problems, pp.325-363, 2012.

M. Gander and H. Zhang, A class of iterative solvers for the Helmholtz equation: factorizations, sweeping preconditioners, source transfer, single layer potentials, polarized traces, and optimized Schwarz methods, SIAM Rev, vol.61, pp.3-76, 2019.

S. Gratton, A. Sartenaer, and P. L. Toint, Recursive trust-region methods for multiscale nonlinear optimization, SIAM. J. Optim, vol.19, pp.414-444, 2008.

C. Groß and R. Krause, On the convergence of recursive trust-region methods for multiscale nonlinear optimization and applications to nonlinear mechanics, SIAM. J. Numer. Anal, vol.47, pp.3044-3069, 2009.

E. Haber, L. Ruthotto, E. Holtham, and S. Jun, Learning across scales-Multiscale methods for convolution neural networks, Thirty-Second AAAI Conference on Artificial Intelligence, 2018.

W. Hackbusch, Multi-grid Methods and Applications, vol.4, 1985.

J. Han, A. Jentzen, and W. E. , Solving high-dimensional partial differential equations using deep learning, Proc. Nat. Acad. Sci, vol.115, pp.8505-8510, 2018.

S. Haykin, Neural Networks: a Comprehensive Foundation, 1994.

R. Hecht-nielsen, Theory of the backpropagation neural network, International 1989 Joint Conference on Neural Networks, vol.1, pp.593-605, 1989.

C. Higham and D. Higham, Deep learning: an introduction for applied mathematicians, SIAM Rev, vol.61, pp.860-891, 2019.

M. Hutzenthaler, A. Jentzen, T. Kruse, T. Nguyen, and P. Wurstemberger, Overcoming the curse of dimensionality in the numerical approximation of semilinear parabolic partial differential equations, 2018.

A. Jentzen, D. Salimova, and T. Welti, A proof that deep artificial neural networks overcome the curse of dimensionality in the numerical approximation of Kolmogorov partial differential equations with constant diffusion and nonlinear drift coefficients, 2018.

T. Ke, M. Maire, and S. X. Yu, Multigrid Neural Architectures, Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR, pp.4067-4075, 2017.

M. Ko?vara and S. Mohammed, A first-order multigrid method for bound-constrained convex optimization, Optim. Method. Softw, vol.31, pp.622-644, 2016.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, NIPS'12 Proceedings of the 25th International Conference on Neural Information Processing Systems, vol.1, pp.1097-1105, 2012.

I. E. Lagaris, A. Likas, and D. I. Fotiadis, Artificial neural networks for solving ordinary and partial differential equations, IEEE Trans. Neural Netw, vol.9, pp.987-1000, 1998.

H. Lee and I. S. Kang, Neural algorithm for solving differential equations, J. Comput. Phys, vol.91, pp.110-131, 1990.

R. Lewis and S. Nash, Model problems for the multigrid optimization of systems governed by differential equations, SIAM. J. Sci. Comput, vol.26, pp.1811-1837, 2005.

R. M. Lewis and S. G. Nash, Using inexact gradients in a multilevel optimization algorithm, Comput. Optim. Appl, vol.56, pp.39-61, 2013.

X. Li, J. Lowengrub, A. Rätz, and A. Voigt, Solving PDEs in complex geometries: a diffuse domain approach, Commun. Math. Sci, vol.7, pp.81-107, 2009.

Z. Long, Y. Lu, X. Ma, B. Dong, and . Pde-net, Learning PDEs from data, Proceedings of the 35th International Conference on Machine Learning, pp.3208-3216, 2018.

L. Manevitz, A. Bitar, and D. Givoli, Neural network time series forecasting of finite-element mesh adaptation, Neurocomputing, vol.63, pp.447-463, 2005.

S. Mishra, A machine learning framework for data driven acceleration of computations of differential equations, Seminar for Applied Mathematics, 2018.

J. Misra and I. Saha, Artificial neural networks in hardware: A survey of two decades of progress, Neurocomputing, vol.74, pp.239-255, 2010.

S. Nash, A multigrid approach to discretized optimization problems, Optimization Methods and Software, vol.14, pp.99-116, 2000.

S. Nash, Properties of a class of multilevel optimization algorithms for equality constrained problems, Optimization Methods and Software, vol.29, pp.137-159, 2014.

M. Raissi and G. E. Karniadakis, Hidden physics models: machine learning of nonlinear partial differential equations, J. Comput. Phys, vol.357, pp.125-141, 2018.

M. Raissi, P. Perdikaris, and G. E. Karniadakis, Physics informed deep learning (part I): datadriven solutions of nonlinear partial differential equations, 2017.

M. Raissi, P. Perdikaris, and G. E. Karniadakis, Physics informed deep learning (part II): datadriven discovery of nonlinear partial differential equations, 2017.

M. Raissi, P. Perdikaris, and G. Karniadakis, Numerical Gaussian processes for time-dependent and nonlinear partial differential equations, SIAM. J. Sci. Comput, vol.40, pp.172-198, 2018.

P. Ramuhalli, L. Udpa, and S. S. Udpa, Finite-element neural networks for solving differential equations, IEEE Transactions on Neural Networks, vol.16, pp.1381-1392, 2005.

K. Rudd, Solving partial differential equations using artificial neural networks, 2013.

S. H. Rudy, S. L. Brunton, J. L. Proctor, and J. N. Kutz, Data-driven discovery of partial differential equations, Sci. Adv, vol.3, 2017.

J. W. Ruge and K. Stüben, Algebraic multigrid, pp.73-130, 1987.

E. Sadrfaridpour, T. Razzaghi, and I. Safro, Engineering fast multilevel support vector machines, 2017.

H. Schaeffer, Learning partial differential equations via data discovery and sparse optimization, Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, p.473, 2017.

C. E. Shannon, Communication in the presence of noise, Proc. IEEE, vol.86, pp.447-457, 1998.

Y. Shirvany, M. Hayati, and R. Moradian, Multilayer perceptron neural networks with novel unsupervised training method for numerical solution of the partial differential equations, Appl. Soft. Comput, vol.9, pp.20-29, 2009.

J. Takeuchi and Y. Kosugi, Neural network representation of finite element method, Neural. Netw, vol.7, pp.389-395, 1994.

U. Trottenberg, C. W. Oosterlee, A. Schuller, and M. , , 2000.

Z. Wen and D. Goldfarb, A line search multigrid method for large-scale nonlinear optimization, SIAM J. Optim, vol.20, pp.1478-1503, 2009.