Interval matrix multiplication on parallel architectures

Philippe Théveny 1, 2, 3, * Nathalie Revol 1, 3
* Corresponding author
1 ARENAIRE - Computer arithmetic
Inria Grenoble - Rhône-Alpes, LIP - Laboratoire de l'Informatique du Parallélisme
3 ARIC - Arithmetic and Computing
Inria Grenoble - Rhône-Alpes, LIP - Laboratoire de l'Informatique du Parallélisme
Abstract : Getting efficiency when implementing interval arithmetic computations is a difficult task. The work presented here deals with the efficient implementation of interval matrix multiplication on parallel architectures. A first issue is the choice of the formulas. The main principle we adopted consists in resorting, as often as possible, to optimized routines such as the BLAS3, as implemented in Intel's MKL for instance. To do so, the formulas chosen to implement interval arithmetic operations are based on the representation of intervals by their midpoint and radius. This approach has been advocated by S. Rump in 1999 and used in particular in his implementation IntLab. It is recalled that a panel of formulas for operations using the midpoint-radius representation exists: exact formulas can be found in A. Neumaier's book: "Interval Methods for Systems of Equations" (1990), in his paper in 1999 S. Rump gave approximate formulas with less operations, H.D. Nguyen in his Ph.D. thesis in 2011 gave a choice of formulas reaching various tradeoffs in terms of operation count and accuracy. These formulas for the addition and multiplication of two intervals are used in the classical formulas for matrix multiplication and can be expressed as operations (addition and multiplication) of matrices of real numbers (either midpoints or radii), S. Rump recapitulates some such matrix expressions in a recent paper (2012). In this presentation, the merits of each approach are discussed, in terms of number of elementary operations, use of BLAS3 routines for the matrix multiplication, and of accuracy. The comparison of the relative accuracies are based on the assumption that arithmetic operations are implemented using exact arithmetic. We also give a comparison of these accuracies, assuming that arithmetic operations are implemented using floating-point arithmetic. A second issue concerns the adaptation to the architecture. Indeed, the architectures targetted in this study are parallel architectures such as multicores or GPU. When implemented on such architectures, some measures such as the arithmetic operations count are no more relevant: the measured execution times do not relate directly to the operations count. This is explained by considerations on memory usage, multithreaded computations... We will show some experiments that take these architectural parameters into account and reach good performances. We will give some tradeoffs between the memory consumption and memory traffic: it can for instance be beneficial to copy (parts of) the involved matrices in the right caches to avoid cache misses and heavy traffic.
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-00750022
Contributor : Nathalie Revol <>
Submitted on : Thursday, November 8, 2012 - 7:47:09 PM
Last modification on : Thursday, January 17, 2019 - 3:16:03 PM

Identifiers

  • HAL Id : hal-00750022, version 1

Collections

Citation

Philippe Théveny, Nathalie Revol. Interval matrix multiplication on parallel architectures. SCAN 2012: 15th GAMM-IMACS International Symposium on Scientific Computing, Computer Arithmetic and Verified Numerical Computations, Sep 2012, Novosibirsk, Russia. ⟨hal-00750022⟩

Share

Metrics

Record views

262