Skip to Main content Skip to Navigation
Conference papers

Efficient Complex Matrix Multiplication on the Synergistic Processing Element of the Cell Processor

Abstract : In order to implement a complete Fast Multipole Method on the Cell processor, we need an efficient complex matrix multiplication on each Synergistic Processing Element (SPE) of the Cell processor. Since the last IBM SDK does not provide such routine, we build our own one in single precision with C programming. We show that the complex matrix multiplication requires a specific computation scheme for the micro-kernel running on the SPE, and that a 32×32 tile is appropriate for close to peak performance computation as well as for communication overlapping. Our micro-kernel delivers 23.74 Gflop/s, which is 92.7% of the SPE peak performance, and we obtain up to 23.65 Gflop/s for one complete complex matrix product on one SPE, and up to 378.36 Gflop/s for 16 products on 16 SPEs.
Document type :
Conference papers
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01290736
Contributor : Lip6 Publications <>
Submitted on : Friday, March 18, 2016 - 2:42:37 PM
Last modification on : Thursday, March 21, 2019 - 1:14:00 PM

Identifiers

Citation

Quentin Bourgerie, Pierre Fortin, Jean-Luc Lamotte. Efficient Complex Matrix Multiplication on the Synergistic Processing Element of the Cell Processor. Workshop on Parallel Programming and Applications on Accelerator Clusters (PPAAC10), held in conjunction with IEEE Cluster 2010, Sep 2010, Heraklion, Crete, Greece. pp.1-8, ⟨10.1109/CLUSTERWKSP.2010.5613077⟩. ⟨hal-01290736⟩

Share

Metrics

Record views

157