On Numerical Resiliency in Numerical Linear Algebra Solvers

Abstract : In this talk we will discuss possible numerical remedies to survive data loss in some numerical linear algebra solvers namely Krylov subspace linear solvers and some widely used aigensolvers. We will present a new class of numerical fault tolerance algorithms at application level that does not require extra resources, i.e., computational unit or computing time, when no fault occurs. Assuming that a separate mechanism ensures fault detection, we propose numerical algorithms to extract relevant information from available data after a fault. After data extraction, well chosen part of missing data is regenerated through interpolation strategies to constitute meaningful inputs to numerically restart the algorithm. We have designed these methods called interpolation-restart techniques for the solution of linear systems and eigensolvers. We will also present some preliminary investigations to address soft error detection again at the application level in the conjugate gradient framework. Finally we will expose the numerous open questions that we are facing that hopefully will lead to fruitful discussions.
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01162627
Contributor : Luc Giraud <>
Submitted on : Thursday, June 11, 2015 - 9:24:20 AM
Last modification on : Wednesday, September 18, 2019 - 1:14:33 AM

Identifiers

  • HAL Id : hal-01162627, version 1

Collections

Citation

Emmanuel Agullo, Luc Giraud, Pablo Salas, Emrullah Fatih Yetkin, Mawussi Zounon. On Numerical Resiliency in Numerical Linear Algebra Solvers . Salishan Conference on High-Speed Computing, DOE laboratories, Apr 2015, Salishan, United States. ⟨hal-01162627⟩

Share

Metrics

Record views

432