An Efficient Memory Operations Optimization Technique for Vector Loops on Itanium 2 Processors - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Concurrency and Computation: Practice and Experience Année : 2006

An Efficient Memory Operations Optimization Technique for Vector Loops on Itanium 2 Processors

Résumé

To keep up with a large degree of instruction level parallelism (ILP), the Itanium 2 cache systems use a complex organization scheme: load/store queues, banking and interleaving. In this paper, we study the impact of these cache systems on memory instructions scheduling. We demonstrate that, if no care is taken at compile time, the non-precise memory disambiguation mechanism and the banking structure cause severe performance loss, even for very simple regular codes. We also show that grouping the memory operations in a pseudo-vectorized way enables the compiler to generate more effective code for the Itanium 2 processor. The impact of this code optimization technique on register pressure is analyzed for various vectorization schemes.
Fichier principal
Vignette du fichier
An_Efficient_Memory.pdf (226.74 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-00130629 , version 1 (28-09-2011)

Identifiants

Citer

Jalby William, Christophe Lemuet, Sid Touati. An Efficient Memory Operations Optimization Technique for Vector Loops on Itanium 2 Processors. Concurrency and Computation: Practice and Experience, 2006, 18 (11), pp.1485-1508. ⟨10.1002/cpe.1017⟩. ⟨hal-00130629⟩

Collections

CNRS UVSQ
64 Consultations
279 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More