Parallel expression template for large vectors
Résumé
This paper describes a short and simple way of improving the performance of vector operations (e.g. X = aY +bZ+::) applied to large vectors. In a previous paper [1] we described how to take advantage of high performance vector copy operation provided by the ATLAS library [2] in the context of C++ Expression Template (ET) mechanism. Here we present a multi-threaded implementation of this approach. The proposed ET implementation that involves a parallel blocking technique, leads to signi cant performance increase compared to existing implementations (up to 2:7) on dual socket x86 64 targets.
Origine : Fichiers produits par l'(les) auteur(s)
Loading...