Reproducible and Accurate Matrix Multiplication for GPU Accelerators - Archive ouverte HAL Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2015

Reproducible and Accurate Matrix Multiplication for GPU Accelerators

Résumé

Due to non-associativity of floating-point operations and dynamic scheduling on parallel architectures, getting a bitwise reproducible floating-point result for multiple executions of the same code on different or even similar parallel architectures is challenging. In this paper, we address the problem of reproducibility in the context of matrix multiplication and propose an algorithm that yields both reproducible and accurate results. This algorithm is composed of two main stages: a filtering stage that uses fast vectorized floating-point expansions in con-junction with error-free transformations; an accumulation stage based on Kulisch long accumulators in a high-radix carry-save representation. Finally, we provide implementations and performance results in parallel environments like GPUs.
Fichier principal
Vignette du fichier
paper.pdf (181 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01102877 , version 1 (13-01-2015)

Identifiants

  • HAL Id : hal-01102877 , version 1

Citer

Roman Iakymchuk, David Defour, Caroline Collange, Stef Graillat. Reproducible and Accurate Matrix Multiplication for GPU Accelerators. 2015. ⟨hal-01102877⟩
860 Consultations
1041 Téléchargements

Partager

Gmail Facebook X LinkedIn More