Parallel Locality and Parallelization Quality

Bernard Goossens; David Parello; Katarzyna Porada; Djallal Rahmoune

doi:10.1145/2883404.2883410

Communication Dans Un Congrès Année : 2016

Parallel Locality and Parallelization Quality

(1, 2) , (1, 2) , (1, 2) , (1, 2)

1
2

Bernard Goossens

Fonction : Auteur
PersonId : 938922

Digits, Architectures et Logiciels Informatiques

Université de Perpignan Via Domitia

David Parello

Fonction : Auteur
PersonId : 6914
IdHAL : david-parello
IdRef : 083867767

Digits, Architectures et Logiciels Informatiques

Université de Perpignan Via Domitia

Katarzyna Porada

Fonction : Auteur
PersonId : 14518
IdHAL : katkacyt
IdRef : 229985351

Digits, Architectures et Logiciels Informatiques

Université de Perpignan Via Domitia

Djallal Rahmoune

Fonction : Auteur
PersonId : 1014679

Digits, Architectures et Logiciels Informatiques

Université de Perpignan Via Domitia

Résumé

This paper presents a new distributed computation model adapted to manycore processors. In this model, the run is spread on the available cores by fork machine instructions produced by the compiler , for example at function calls and loops iterations. This approach is to be opposed to the actual model of computation based on cache and predictor. Cache efficiency relies on data locality and predictor efficiency relies on the reproducibility of the control. Data locality and control reproducibility are less effective when the execution is distributed. The computation model proposed is based on a new core hardware. Its main features are described in this paper. This new core is the building block of a manycore design. The processor automatically parallelizes an execution. It keeps the computation deterministic by constructing a totally ordered trace of the machine instructions run. References are renamed, including memory , which fixes the communications and synchronizations needs. When a data is referenced, its producer is found in the trace and the reader is synchronized with the writer. This paper shows how a consumer can be located in the same core as its producer, improving parallel locality and parallelization quality. Our determin-istic and fine grain distribution of a run on a manycore processor is compared with OS primitives and API based parallelization (e.g. pthread, OpenMP or MPI) and to compiler automatic paralleliza-tion of loops. The former implies (i) a high OS overhead meaning that only coarse grain parallelization is cost-effective and (ii) a non deterministic behaviour meaning that appropriate synchronization to eliminate wrong results is a challenge. The latter is unable to fully parallelize general purpose programs due to structures like functions, complex loops and branches.

Domaines

Architectures Matérielles [cs.AR] Calcul parallèle, distribué et partagé [cs.DC]

Fichier principal

pmam_2016.pdf (245.26 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

David Parello : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01252007

Soumis le : jeudi 7 janvier 2016-10:12:31

Dernière modification le : vendredi 24 mars 2023-14:53:01

Archivage à long terme le : vendredi 8 avril 2016-13:13:15

Dates et versions

hal-01252007 , version 1 (07-01-2016)

Licence

Paternité - Pas de modifications

Identifiants

HAL Id : hal-01252007 , version 1
DOI : 10.1145/2883404.2883410

Citer

Bernard Goossens, David Parello, Katarzyna Porada, Djallal Rahmoune. Parallel Locality and Parallelization Quality. PMAM: Programming Models and Applications for Multicores and Manycores, Mar 2016, Barcelona, Spain. pp.59-68, ⟨10.1145/2883404.2883410⟩. ⟨hal-01252007⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS UNIV-PERP DALI LIRMM MIPS UNIV-MONTPELLIER

257 Consultations

420 Téléchargements

Parallel Locality and Parallelization Quality

Résumé

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager