Reproducible and Accurate Matrix Multiplication for GPU Accelerators

Abstract : Due to non-associativity of floating-point operations and dynamic scheduling on parallel architectures, getting a bitwise reproducible floating-point result for multiple executions of the same code on different or even similar parallel architectures is challenging. In this paper, we address the problem of reproducibility in the context of matrix multiplication and propose an algorithm that yields both reproducible and accurate results. This algorithm is composed of two main stages: a filtering stage that uses fast vectorized floating-point expansions in con-junction with error-free transformations; an accumulation stage based on Kulisch long accumulators in a high-radix carry-save representation. Finally, we provide implementations and performance results in parallel environments like GPUs.
Type de document :
Pré-publication, Document de travail
2015
Liste complète des métadonnées

Littérature citée [21 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01102877
Contributeur : Roman Iakymchuk <>
Soumis le : mardi 13 janvier 2015 - 16:14:27
Dernière modification le : vendredi 16 novembre 2018 - 02:02:20
Document(s) archivé(s) le : mardi 14 avril 2015 - 11:16:21

Fichier

paper.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01102877, version 1

Citation

Roman Iakymchuk, David Defour, Sylvain Collange, Stef Graillat. Reproducible and Accurate Matrix Multiplication for GPU Accelerators. 2015. 〈hal-01102877〉

Partager

Métriques

Consultations de la notice

945

Téléchargements de fichiers

903