%0 Conference Proceedings
%T A new parallelization scheme for the Hermite interpolation based gyroaverage operator
%+ TOkamaks and NUmerical Simulations (TONUS)
%+ Institut de Recherche sur la Fusion par confinement Magnétique (IRFM)
%+ Maison de la Simulation (MDLS)
%+ Réseaux, Moyens Informatiques, Calcul Scientifique (Remics)
%+ High-End Parallel Algorithms for Challenging Numerical Simulations (HiePACS)
%A Bouzat, Nicolas
%A Rozar, Fabien
%A Latu, Guillaume
%A Roman, Jean
%< avec comité de lecture
%B ISPDC 2017 - 16th International Symposium on Parallel and Distributed Computing
%C Innsbruck, Austria
%I IEEE
%3 2017 16th International Symposium on Parallel and Distributed Computing (ISPDC)
%V 2017
%P 1-8
%8 2017-07-03
%D 2017
%R 10.1109/ISPDC.2017.12
%Z Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC]Conference papers
%X Gyrokinetic modeling is appropriate for describing plasma turbulence in the core of Tokamaks, and the gyroaverage operator is a cornerstone of this approach. In a gyrokinetic code the gyroaveraging scheme needs to be accurate enough, but also requires a low computational cost because it is often applied on the main unknown, namely the 5D guiding-center distribution function, as well as on several 3D fields. The current gyroaverage implementation used in the GYSELA code has recently been improved [6], enhancing the precision of the operator thanks to Hermite interpolation. In the present paper, we describe a new parallelization scheme for the gyroaverage operator. It mainly avoids costly transpositions of the full 5D function using halo exchange instead. Though the computational cost remains the same, the communication one is much smaller. The overall algorithm is also improved by cleverly interleaving communications and computations, thus allowing for a reduction of communication costs and a more efficient thread parallelization. The execution time with this algorithm is up to twice as fast as the previous version. The benefit of an improved scheme providing the overlap of communications by computations is also shown, again improving execution times. The description of the algorithms is given, together with an analysis of the achieved performance.
%G English
%2 https://inria.hal.science/hal-01687727/document
%2 https://inria.hal.science/hal-01687727/file/paper.pdf
%L hal-01687727
%U https://inria.hal.science/hal-01687727
%~ CEA
%~ CNRS
%~ INRIA
%~ IRMA
%~ INRIA-BORDEAUX
%~ LMGC
%~ UNIV-STRASBG
%~ INRIA_TEST
%~ INRIA-LORRAINE
%~ INRIA-NANCY-GRAND-EST
%~ TESTALAIN1
%~ MDLS
%~ DSM-IRFM
%~ UVSQ
%~ TESTBORDEAUX
%~ INRIA2
%~ CEA-UPSAY
%~ UNIV-PARIS-SACLAY
%~ CEA-UPSAY-SACLAY
%~ UVSQ-SACLAY
%~ MIPS
%~ INRIA2017
%~ UNIV-MONTPELLIER
%~ CEA-DRF
%~ SITE-ALSACE
%~ CEA-CAD
%~ TEST-HALCNRS
%~ IRMAMOCO
%~ UVSQ-UPSACLAY
%~ GS-COMPUTER-SCIENCE
%~ UM-2015-2021