A Reproducible Accurate Summation Algorithm for High-Performance Computing

Caroline Collange; David Defour; Stef Graillat; Roman Iakymchuk

Communication Dans Un Congrès Année : 2014

A Reproducible Accurate Summation Algorithm for High-Performance Computing

(1) , (2) , (3) , (4, 3)

1
2
3
4

Caroline Collange

Fonction : Auteur
PersonId : 177452
IdHAL : caroline-collange
IdRef : 151116776

Amdahl's Law is Forever

David Defour

Fonction : Auteur
PersonId : 4651
IdHAL : david-defour
ORCID : 0000-0001-9923-2394
IdRef : 104542454

Digits, Architectures et Logiciels Informatiques

Stef Graillat

Fonction : Auteur
PersonId : 5653
IdHAL : stef-graillat
IdRef : 104060735

Performance et Qualité des Algorithmes Numériques

Roman Iakymchuk

Fonction : Auteur
PersonId : 966
IdHAL : roman-iakymchuk
IdRef : 253135079

Institut des Sciences du Calcul et des Données

Performance et Qualité des Algorithmes Numériques

Résumé

Floating-point (FP) addition is non-associative and parallel reduction involving this operation is a serious issue as noted in the DARPA Exascale Report [1]. Such large summations typically appear within fundamental numerical blocks such as dot products or numerical integrations. Hence, the result may vary from one parallel machine to another or even from one run to another. These discrepancies worsen on heterogeneous architectures – such as clusters with GPUs or Intel Xeon Phi processors – which combine programming environments that may obey various floating-point models and offer different intermediate precision or different operators [2,3]. Such non-determinism of floating-point calculations in parallel programs causes validation and debugging issues, and may lead to deadlocks [4]. The increasing power of current computers enables one to solve more and more complex problems. That, consequently, leads to a higher number of floating-point operations to be performed; each of them potentially causing a round-off error. Because of the round-off error propagation, some problems must be solved with a wider floating-point format. Two approaches exist to perform floating-point addition without incurring round-off errors. The first approach aims at computing the error that is occurred during rounding using FP expansions, which are based on an error-free transformation. FP expansions represent the result as an unevaluated sum of a fixed number of FP numbers, whose components are ordered in magnitude with minimal overlap to cover a wide range of exponents. FP expansions of sizes 2 and 4 are presented in [5] and [6], accordingly. The main advantage of this solution is that the expansion can stay in registers during the computations. But, the accuracy is insufficient for the summation of numerous FP numbers or sums with a huge dynamic range. Moreover, their complexity grows linearly with the size of the expansion. An alternative approach to expansions exploits the finite range of representable floating-point numbers by storing every bit in a very long vector of bits (accumulator). The length of the accumulator is chosen such that every bit of information of the input format can be represented; this covers the range from the minimum representable floating-point value to the maximum value, independently of the sign. For instance, Kulisch [7] proposed to utilize an accumulator of 4288 bits to handle the accumulation of products of 64-bit IEEE floating-point values. The Kulisch accumulator is a solution to produce the exact result of a very large amount of floating-point numbers of arbitrary magnitude. However, for a long period this approach was considered impractical as it induces a very large memory overhead. Furthermore, without dedicated hardware support, its performance is limited by indirect memory accesses that make vectorization challenging. We aim at addressing both issues of accuracy and reproducibility in the context of summation. We advocate to compute the correctly-rounded result of the exact sum. Besides offering strict reproducibility through an unambiguous definition of the expected result, our approach guarantees that the result has

Mots clés

multi-and many-core architectures multi-precision Parallel floating-point summation reproducibility accuracy long accumulator

Domaines

Architectures Matérielles [cs.AR]

Fichier principal

10-iakymchuk-abstract.pdf (89.55 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

David Defour : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01267825

Soumis le : jeudi 4 février 2016-20:30:36

Dernière modification le : mardi 11 avril 2023-15:16:28

Archivage à long terme le : samedi 12 novembre 2016-09:50:20

Dates et versions

hal-01267825 , version 1 (04-02-2016)

Identifiants

HAL Id : hal-01267825 , version 1

Citer

Caroline Collange, David Defour, Stef Graillat, Roman Iakymchuk. A Reproducible Accurate Summation Algorithm for High-Performance Computing. EX: Exascale Applied Mathematics Challenges and Opportunities, Jul 2014, Chicago, United States. ⟨hal-01267825⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM UPMC EC-PARIS UNIV-RENNES1 CNRS INRIA UNIV-PERP INSA-RENNES IRISA LIP6 DALI LIRMM IRISA-D3 INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC MIPS UNIV-MONTPELLIER UNIV-RENNES SORBONNE-UNIVERSITE SU-SCIENCES FED-3 ISCD UR1-MATH-NUM

460 Consultations

148 Téléchargements

A Reproducible Accurate Summation Algorithm for High-Performance Computing

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager