Automatic, Abstracted and Portable Topology-Aware Thread Placement

Abstract : Efficiently programming shared-memory machines is a difficult challenge because mapping application threads onto the memory hierarchy has a strong impact on the performance. However, optimizing such thread placement is difficult: architectures become increasingly complex and application behavior changes with implementations and input parameters, e.g problem size and number of threads. In this work, we propose a fully automatic, abstracted and portable affinity module. It produces and implements an optimized affinity strategy that combines knowledge about application characteristics and the platform topology. Implemented in the back-end of our runtime system (ORWL), our approach was used to enhance the performance and the scalability of several unmodified ORWL-coded applications: matrix multiplication, a 2D stencil (Livermore Kernel 23), and a video tracking real world application. On two SMP machines with quite different hardware characteristics, our tests show spectacular performance improvements for these unmodified application codes due to a dramatic decrease of cache misses and pipeline stalls. A comparison to reference implementations using OpenMP confirms this performance gain of almost one order of magnitude.
Type de document :
Communication dans un congrès
IEEE Cluster, Sep 2017, Hawaï, United States. pp.389 - 399, 2017, Cluster Computing (CLUSTER), 2017 IEEE International Conference on. 〈10.1109/CLUSTER.2017.71〉
Liste complète des métadonnées

Littérature citée [15 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01621936
Contributeur : Farouk Mansouri <>
Soumis le : mercredi 25 octobre 2017 - 21:30:22
Dernière modification le : samedi 27 octobre 2018 - 01:26:50
Document(s) archivé(s) le : vendredi 26 janvier 2018 - 12:23:13

Fichier

IEEE_Cluster_2017_Paper_278.pd...
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Jens Gustedt, Emmanuel Jeannot, Farouk Mansouri. Automatic, Abstracted and Portable Topology-Aware Thread Placement. IEEE Cluster, Sep 2017, Hawaï, United States. pp.389 - 399, 2017, Cluster Computing (CLUSTER), 2017 IEEE International Conference on. 〈10.1109/CLUSTER.2017.71〉. 〈hal-01621936〉

Partager

Métriques

Consultations de la notice

239

Téléchargements de fichiers

87