L. Lagardère, L. H. Jolly, F. Lipparini, F. Aviat, B. Stamm et al., Tinker-HP: a Massively Parallel Molecular Dynamics Package for Multiscale Simulations of Large Complex Systems with Advanced Polarizable Force Fields, Chemical Science, vol.9, pp.956-972, 2018.

J. A. Rackers, Z. Wang, C. Lu, M. L. Laury, L. Lagardère et al., Tinker 8: Software Tools for Molecular Design, J Chem Theory Comput, vol.14, issue.10, p.30176213, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01820747

B. M. Shabanov, A. A. Rybakov, and S. S. Shumilin, Vectorization of Highperformance Scientific Calculations Using AVX-512 Intruction Set, Lobachevskii Journal of Mathematics, vol.40, issue.5, pp.580-598, 2019.

A. Mathuriya, Y. Luo, R. C. Clay, . Iii, A. Benali et al., Embracing a New Era of Highly Efficient and Productive Quantum Monte Carlo Simulations, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis SC '17, vol.38, p.12, 2017.

,

P. Y. Ren and J. W. Ponder, Polarizable Atomic Multipole Water Model for Molecular Mechanics Simulation, J Phys Chem, vol.107, issue.24, pp.5933-5947, 2003.

Y. Shi, Z. Xia, J. Zhang, R. Best, J. W. Ponder et al., The Polarizable Atomic Multipole-Based AMOEBA Force Field for Proteins, J Chem Theory Comput, vol.9, issue.9, pp.4046-4063, 2013.

C. Zhang, C. Lu, Z. Jing, C. Wu, J. P. Piquemal et al., AMOEBA Polarizable Atomic Multipole Force Field for Nucleic Acids, J Chem Theory Comput, vol.14, pp.2084-2108, 2018.
URL : https://hal.archives-ouvertes.fr/hal-02126772

A. D. Mackerell, D. Bashford, M. Bellott, R. L. Dunbrack, J. D. Evanseck et al., All-Atom Empirical Potential for Molecular Modeling and Dynamics Studies of Proteins, The Journal of Physical Chemistry B, vol.102, issue.18, p.24889800, 1998.

J. Wang, R. M. Wolf, J. W. Caldwell, P. A. Kollman, and D. A. Case, Development and testing of a general amber force field, vol.25, pp.1157-1174, 2004.

W. L. Jorgensen, D. S. Maxwell, and J. Tirado-rives, Development and Testing of the OPLS All-Atom Force Field on Conformational Energetics and Properties of Organic Liquids, J Am Chem Soc, vol.117, pp.11225-11236, 1996.

I. Corp, Architecture -Instruction Set Extensions Programming Reference, 2014.

S. Maleki, Y. Gao, M. J. Garzaran, T. Wong, and D. A. Padua, An evaluation of vectorizing compilers, Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT'11, pp.372-382, 2011.

,

I. Corp, Vectorization Advisor, 2019.

I. Corp, Intel Parallel Studio XE, 2019.

L. S. Blackford, A. Petitet, R. Pozo, R. K. Whaley, R. C. Demmel et al., An updated set of basic linear algebra subprograms (BLAS), ACM Transactions on Mathematical Software, vol.28, issue.2, pp.135-151, 2002.

E. Anderson, Z. Bai, J. Dongarra, A. Greenbaum, A. Mckenney et al., LAPACK: A Portable Linear Algebra Library for Highperformance Computers, Proceedings of the 1990 ACM/IEEE Conference on Supercomputing Supercomputing '90, pp.2-11, 1990.

M. Frigo and S. G. Johnson, The design and implementation of FFTW3, Proceedings of the IEEE, vol.93, issue.2, pp.216-231, 2005.

A. Openmp, OpenMP Application Programming Interface Version 5.0, 2018.

M. Tuckerman, B. J. Berne, and G. J. Martyna, Reversible multiple time scale molecular dynamics, J Chem Phys, vol.97, issue.3, pp.1990-2001, 1992.

J. W. Ponder, C. Wu, P. Y. Ren, V. S. Pande, J. D. Chodera et al., Current Status of the AMOEBA Polarizable Force Field, J Phys Chem B, vol.114, issue.8, pp.2549-64, 2007.

J. P. Piquemal, L. Perera, G. A. Cisneros, P. Ren, L. G. Pedersen et al., Towards accurate solvation dynamics of divalent cations in water using the polarizable amoeba force field: From energetics to structure, The Journal of Chemical Physics, vol.125, issue.5, p.54511, 2006.
URL : https://hal.archives-ouvertes.fr/hal-02126806

J. C. Wu, J. P. Piquemal, R. Chaudret, P. Reinhardt, and P. Ren, Polarizable Molecular Dynamics Simulation of Zn(II) in Water Using the AMOEBA Force Field, J Chem Theory Comput, vol.6, pp.2059-2070, 2010.
URL : https://hal.archives-ouvertes.fr/hal-02126833

A. Marjolin, C. Gourlaouen, C. Clavaguéra, P. Y. Ren, J. C. Wu et al., Toward accurate solvation dynamics of lanthanides and actinides in water using polarizable force fields: from gas-phase energetics to hydration free energies, Theoretical Chemistry Accounts, vol.131, issue.4, p.1198, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00904389

A. Marjolin, C. Gourlaouen, C. Clavaguéra, P. Y. Ren, J. P. Piquemal et al., Hydration gibbs free energies of open and closed shell trivalent lanthanide and actinide cations from polarizable molecular dynamics, Journal of Molecular Modeling, vol.20, issue.10, p.2471, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01157657

H. Watanabe and K. M. Nakagawa, SIMD Vectorization for the Lennard-Jones Potential with AVX2 and AVX-512 instructions

. Corr, , 2018.

F. Célerse, L. Lagardère, E. Derat, and J. P. Piquemal, Massively Parallel Implementation of Steered Molecular Dynamics in Tinker-HP: Comparisons of Polarizable and NonPolarizable Simulations of Realistic Systems, Journal of Chemical Theory and Computation, vol.15, issue.6, p.31059250, 2019.

M. J. Abraham, T. Murtola, R. Schulz, S. Pã¡ll, J. C. Smith et al., GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers, SoftwareX, vol.1, issue.2, pp.19-25, 2015.

,

J. C. Phillips, R. Braun, W. Wang, J. Gumbart, E. Tajkhorshid et al., Scalable molecular dynamics with NAMD, Journal of Computational Chemistry, vol.26, issue.16, pp.1781-1802, 2005.

C. Kobayashi, J. Jung, Y. Matsunaga, T. Mori, T. Ando et al., GENESIS 1.1: A hybrid-parallel molecular dynamics simulator with enhanced sampling algorithms on multiple computational platforms, vol.38, pp.2193-2206, 2017.

K. J. Bowers, E. Chow, H. Xu, R. O. Dror, M. P. Eastwood et al., Scalable Algorithms for Molecular Dynamics Simulations on Commodity Clusters, Proceedings of the, 2006.

, ACM/IEEE Conference on Supercomputing SC '06, 2006.

L. Lagardère, F. Aviat, and J. P. Piquemal, Pushing the Limits of MultipleTime-Step Strategies for Polarizable Point Dipole Molecular Dynamics, The Journal of Physical Chemistry Letters, vol.10, pp.2593-2599, 2019.

F. Aviat, A. Levitt, Y. Maday, B. Stamm, P. Y. Ren et al., Truncated Conjugate Gradient (TCG): an optimal strategy for the analytical evaluation of the many-body polarization energy and forces in molecular simulations, J Chem Theory Comput, vol.13, pp.180-190, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01395833

F. Aviat, L. Lagardère, and J. P. Piquemal, The Truncated Conjugate Gradient (TCG), a Non-iterative/Fixed-cost Strategy for Computing Polarization in Molecular Dynamics: Fast Evaluation of Analytical Forces, J Chem Phys, vol.147, p.161724, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01571663

D. Nocito and G. Beran, Massively Parallel Implementation of Divide-and-Conquer Jacobi Iterations Using ParticleMesh Ewald for Force Field Polarization, Journal of Chemical Theory and Computation, vol.14, issue.7, p.29847125, 2018.

N. Gresh, G. A. Cisneros, T. A. Darden, and J. P. Piquemal, Anisotropic, polarizable molecular mechanics studies of inter-, intra-molecular interactions, and ligand-macromolecule complexes. A bottomup strategy, J Chem Theory Comput, vol.3, issue.6, pp.1960-1986, 2007.

J. P. Piquemal, H. Chevreau, and N. Gresh, Toward a Separate Reproduction of the Contributions to the Hartree-Fock and DFT Intermolecular Interaction Energies by Polarizable Molecular Mechanics with the SIBFA Potential, Journal of Chemical Theory and Computation, vol.3, issue.3, p.26627402, 2007.
URL : https://hal.archives-ouvertes.fr/hal-02126810

C. Liu, J. P. Piquemal, and P. Ren, AMOEBA+ Classical Potential for Modeling Molecular Interactions, Journal of Chemical Theory and Computation, vol.15, issue.7, p.31136175, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02142886

J. A. Rackers, Q. Wang, C. Liu, J. P. Piquemal, P. Ren et al., An optimized charge penetration model for use with the AMOEBA force field, Phys Chem Chem Phys, vol.19, pp.276-291, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01405847

J. A. Rackers and J. W. Ponder, Classical Pauli repulsion: An anisotropic, atomic multipole model, The Journal of Chemical Physics, vol.150, issue.8, p.84104, 2019.

. .. Expand, 16 6 Performance gain with CHARMM forces field for the DHFR using Rel or Vec. The boost factor remains almost constant when increasing the number of cores, Memory layout of a running process. Arrows give the directions in which the zones

, 5 2 Profiling of Rel using Intel VTune Amplifier. Simulations ran on one core and 100 steps. MS is DHFR with AMOEBA polarizable force field and with CHARMM force field (no polarization). Most important NUC and computational hostspots are shown in separate frames. vmlinux is the system kernel, performing memory operations and system calls. For CHARMM calculation, List of Tables 1 MS used for the performance measurements. The numbers of cores are taken from [1] for comparison. The CPU2 raw gives the number of cores which produced the best performance (See tables 5, 6 and 7)

. .. , 15 7 Best production performances for the different MS using Rel2, Rel2-multi (multi-timestep) and Vec2-multi (multi-timestep). For DHFR, COX-2, STMV and Ribosome, optimal results with CPU2 setup are also shown (see table 1), the starred lines are counted in the total CPU time for comparison with Vec. The ? on some lines indicate routines that have not been vectorized in Vec. Thus, they don't count in the total CPU time for comparison. . . . 6 3 Profiling of Vec using Intel VTune Amplifier. Simulations ran on one core and 100 steps. MS is DHFR with AMOEBA polarizable force field and with CHARMM force field (no polarization), p.17

, Listings 1

. .. , 4 3 Typical array declarations in a module with alignment directives. Integer arrays precede real*8 arrays. Arrays are ordered as per their utilization wherever possible, Flags used for the compilation of Tinker-HP with Intel Fortran compiler

, 10 6 Vectorization report for the mask creation. Recall that the speedup reported is not in time of execution, but in number of operations

, 11 9 Final selection loop with no PACK function, Vectorization report for the selection loop (pack version)

. .. , 11 12 A typical compute loop. Starting from the already available rik2vec and rikvec, it computes all the powers of rikvec and intermediate quantities needed by the Halgren buffered function. Notice that there are 8 instructions and 11 different array references

, 12 14 Typical calculation loop assembly code showing vector only operations. Loads and stores have been optimized. . 12 15 Excerpt of a vectorization report for the compute loop with division