GPU-Accelerated Generation of Correctly Rounded Elementary Functions

Pierre Fortin 1 Mourad Gouicem 2 Stef Graillat 1
1 PEQUAN - Performance et Qualité des Algorithmes Numériques
LIP6 - Laboratoire d'Informatique de Paris 6
2 ECO - Exact Computing
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : The IEEE 754-2008 standard recommends the correct rounding of some elementary functions. This requires solving the Table Maker’s Dilemma (TMD), which implies a huge amount of CPU computation time. In this article, we consider accelerating such computations, namely the Lefèvre algorithm on graphics processing units (GPUs), which are massively parallel architectures with a partial single instruction, multiple data execution. We first propose an analysis of the Lefèvre hard-to-round argument search using the concept of continued fractions. We then propose a new parallel search algorithm that is much more efficient on GPUs thanks to its more regular control flow. We also present an efficient hybrid CPU-GPU deployment of the generation of the polynomial approximations required in the Lefèvre algorithm. In the end, we manage to obtain overall speedups up to 53.4× on one GPU over a sequential CPU execution and up to 7.1× over a hex-core CPU, which enable a much faster solution of the TMD for the double-precision format.
Complete list of metadatas

Cited literature [21 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-00751446
Contributor : Mourad Gouicem <>
Submitted on : Wednesday, June 5, 2013 - 1:44:08 PM
Last modification on : Tuesday, May 14, 2019 - 10:06:16 AM
Long-term archiving on : Friday, September 6, 2013 - 4:12:08 AM

Files

article.pdf
Files produced by the author(s)

Identifiers

Citation

Pierre Fortin, Mourad Gouicem, Stef Graillat. GPU-Accelerated Generation of Correctly Rounded Elementary Functions. ACM Transactions on Mathematical Software, Association for Computing Machinery, 2016, 43 (3), pp.22:1--22:26. ⟨10.1145/2935746⟩. ⟨hal-00751446v2⟩

Share

Metrics

Record views

322

Files downloads

286