Optimizing correctly-rounded reciprocal square roots for embedded VLIW cores - Archive ouverte HAL Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2009

Optimizing correctly-rounded reciprocal square roots for embedded VLIW cores

Résumé

This paper presents an optimized software implementation of the reciprocal square root function, for IEEE binary32 floating-point data and with correct rounding to nearest. The main feature of this implementation is high instruction level parallelism (ILP) exposure, which results here from an extension of the polynomial evaluation-based method of~\cite{JeKnMoRe08} as well as from the design of a specific rounding procedure. This implementation proves to be very efficient for some VLIW processor cores like STMicroelectronics' ST231 (used mainly for embedded media processing), where a low latency of 29 cycles has been measured.
Fichier principal
Vignette du fichier
rsqrt.pdf (86.85 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

ensl-00391185 , version 1 (03-06-2009)
ensl-00391185 , version 2 (25-11-2009)

Identifiants

  • HAL Id : ensl-00391185 , version 1

Citer

Claude-Pierre Jeannerod, Guillaume Revy. Optimizing correctly-rounded reciprocal square roots for embedded VLIW cores. 2009. ⟨ensl-00391185v1⟩
151 Consultations
663 Téléchargements

Partager

Gmail Facebook X LinkedIn More