Numerical Reproducibility for the Parallel Reduction on Multi- and Many-Core Architectures

On modern multi-core, many-core, and heterogeneous architectures, floating-point computations, especially reductions, may become non-deterministic and, therefore, non-reproducible mainly due to the non-associativity of floating-point operations. We introduce an approach to compute the correctly rounded sums of large floating-point vectors accurately and efficiently, achieving deterministic results by construction. Our multi-level algorithm consists of two main stages: a filtering stage that relies on fast vectorized floating-point expansions, and an accumulation stage based on superaccumulators in a high-radix carry-save representation. We present implementations on recent Intel desktop and server processors, Intel Xeon Phi accelerators, and both AMD and NVIDIA GPUs. We show that numerical reproducibility and bit-perfect accuracy can be achieved at no additional cost for large sums that have dynamic ranges of up to 90 orders of magnitude by leveraging arithmetic units that are left underused by standard reduction algorithms.

Mots clés

Accuracy Kulisch long accumulator Error-free transformations Reproducibility Parallel floating-point summation Multi- and many-core architectures

Domaines

Arithmétique des ordinateurs Calcul parallèle, distribué et partagé [cs.DC]

Fichier principal

superaccumulator.pdf (405.75 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Roman Iakymchuk : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00949355

Soumis le : jeudi 10 septembre 2015-15:49:43

Dernière modification le : mardi 11 avril 2023-15:16:28

Archivage à long terme le : mardi 29 décembre 2015-00:05:21

Dates et versions

hal-00949355 , version 1 (25-02-2014)

hal-00949355 , version 2 (28-02-2014)

hal-00949355 , version 3 (09-02-2015)

hal-00949355 , version 4 (10-09-2015)

Identifiants

HAL Id : hal-00949355 , version 4

Citer

Caroline Collange, David Defour, Stef Graillat, Roman Iakymchuk. Numerical Reproducibility for the Parallel Reduction on Multi- and Many-Core Architectures. 2015. ⟨hal-00949355v4⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM UPMC UNIV-RENNES1 CNRS INRIA UNIV-PERP INSA-RENNES IRISA LIP6 DALI LIRMM CENTRALESUPELEC IRISA-D3 INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC MIPS UNIV-MONTPELLIER UNIV-RENNES SORBONNE-UNIVERSITE SU-SCIENCES FED-3 ISCD ANR UR1-MATH-NUM

2329 Consultations

3471 Téléchargements