Transformations for Energy Efficient Accelerated Chain Matrix Multiplication (TEE-ACM 2)
Résumé
GPU matrix chain multiplication serves as a basis for a wide range of scientific domains like computer graphics, physics, and machine learning. While its time performance was studied for years, there has been significantly less effort in optimizing its energy efficiency. GPU power consumption is heavily impacted by the number of data transfers performed. In fact, a data transfer from global memory needs a thousand times more energy than a double precision arithmetic operation. Thus, minimizing data transfers is key for reducing the energy consumption. We present an energy efficient solution for Matrix Chain Multiplication on GPUs that minimizes computation as well as off-chip data transfers. For this, optimizations at three different levels are provided. For a single matrix multiplication, we use a blocking strategy that allows us to achieve the minimum number of global memory loads for a given amount of shared memory. We extend our approach to three matrices to decrease the data transfers even further. Finally, we use a parenthesizing algorithm that minimizes the number of computations as well as memory transfers for a whole sequence of matrices.
Domaines
Autre [cs.OH]
Origine : Fichiers produits par l'(les) auteur(s)