Recursion based parallelization of exact dense linear algebra routines for Gaussian elimination

We present block algorithms and their implementation for the parallelization of sub-cubic Gaussian elimination on shared memory architectures. Contrarily to the classical cubic algorithms in parallel numerical linear algebra, we focus here on recursive algorithms and coarse grain parallelization. Indeed, sub-cubic matrix arithmetic can only be achieved through recursive algorithms making coarse grain block algorithms perform more efficiently than fine grain ones. This work is motivated by the design and implementation of dense linear algebra over a finite field, where fast matrix multiplication is used extensively and where costly modular reductions also advocate for coarse grain block decomposition. We incrementally build efficient kernels, for matrix multiplication first, then triangular system solving, on top of which a recursive PLUQ decomposition algorithm is built. We study the parallelization of these kernels using several algorithmic variants: either iterative or recursive and using different splitting strategies. Experiments show that recursive adaptive methods for matrix multiplication, hybrid recursive-iterative methods for triangular system solve and tile recursive versions of the PLUQ decomposition, together with various data mapping policies, provide the best performance on a 32 cores NUMA architecture. Overall, we show that the overhead of modular reductions is more than compensated by the fast linear algebra algorithms and that exact dense linear algebra matches the performance of full rank reference numerical software even in the presence of rank deficiencies.

Mots clés

Dataflow task dependencies NUMA architecture Finite field Parallel shared memory computation Rank deficiencies PLUQ decomposition

Domaines

Calcul parallèle, distribué et partagé [cs.DC] Algorithme et structure de données [cs.DS] Logiciel mathématique [cs.MS] Performance et fiabilité [cs.PF] Calcul formel [cs.SC]

Fichier principal

parco_DumasGautierPernetRochSultan.pdf (526.23 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Clément Pernet : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01084238

Soumis le : jeudi 24 septembre 2015-16:51:37

Dernière modification le : jeudi 4 avril 2024-21:01:13

Archivage à long terme le : mercredi 26 avril 2017-18:32:37

Dates et versions

hal-01084238 , version 1 (18-11-2014)

hal-01084238 , version 2 (24-09-2015)

Licence

Paternité

Identifiants

HAL Id : hal-01084238 , version 2
DOI : 10.1016/j.parco.2015.10.003

Citer

Jean-Guillaume Dumas, Thierry Gautier, Clément Pernet, Jean-Louis Roch, Ziad Sultan. Recursion based parallelization of exact dense linear algebra routines for Gaussian elimination. Parallel Computing, 2016, 57, pp.235-249. ⟨10.1016/j.parco.2015.10.003⟩. ⟨hal-01084238v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-LYON UGA CNRS INRIA UNIV-LYON1 LIG LJK LJK_MAD LJK_MAD_CASYS INRIA2 UDL ANR LIG_SIDCH

1325 Consultations

602 Téléchargements