202 articles – 530 Notices  [english version]
HAL : hal-00130629, version 1

Fiche concise  Récupérer au format
An Efficient Memory Operations Optimization Technique for Vector Loops on Itanium 2 Processors
William J., Lemuet C., TOUATI S.-A.-A.
Concurrency and Computation: Practice and Experience 18, 11 (2006) 1485-1508 - http://hal.archives-ouvertes.fr/hal-00130629
Articles dans des revues avec comité de lecture
Informatique/Autre
An Efficient Memory Operations Optimization Technique for Vector Loops on Itanium 2 Processors
Jalby William 1, Christophe Lemuet 1, Sid-Ahmed-Ali TOUATI 1
1 :  Parallélisme, Réseaux, Systèmes d'information, Modélisation (PRISM)
http://www.prism.uvsq.fr/
CNRS : UMR8144 – Université de Versailles Saint-Quentin-en-Yvelines
45 avenue des Etats-Unis Bâtiment Descartes 78035 Versailles CEDEX
France
To keep up with a large degree of instruction level parallelism (ILP), the Itanium 2 cache systems use a complex organization scheme: load/store queues, banking and interleaving. In this paper, we study the impact of these cache systems on memory instructions scheduling. We demonstrate that, if no care is taken at compile time, the non-precise memory disambiguation mechanism and the banking structure cause severe performance loss, even for very simple regular codes. We also show that grouping the memory operations in a pseudo-vectorized way enables the compiler to generate more effective code for the Itanium 2 processor. The impact of this code optimization technique on register pressure is analyzed for various vectorization schemes.
Anglais

Concurrency and Computation: Practice and Experience
Publisher Wiley-Blackwell
ISSN 1532-0626 (eISSN : 1532-0634)
internationale
09/2006
18
11
1485-1508

performance measurement – cache optimization – memory access optimization – bank conflicts – memory address disambiguation – instruction level parallelism

Liste des fichiers attachés à ce document : 
PDF
An_Efficient_Memory.pdf(248.7 KB)