DFG implementation on multi GPU cluster with computation-communication overlap

Abstract : Nowadays, computers embed many CPUs and at least one GPU. Workstations can host several GPU cards, which are well suited for scientific and engineering computations. Such computers are linked through high bandwidth networks to compose clusters for HPC. These machines provide highly parallel multicore architectures while being cost-effective. Moreover, they significantly reduce dissipated power, and space needs compared to classical HPC clusters. Recently NVIDIA or ATI announced Tesla or Firestream boards, performing more than 500 gigaflops of double precision performance and dissipating less than 250 W for single GPU board. However, the real challenge is to achieve the highest performances on muti-GPU architectures. The programmer has to design architecture-specific code including GPU communications and memory management, task scheduling and synchronization. So, a high level programming abstract model is required to express all these important operations. In this paper, we propose a design flow allowing an efficient implementation of a DSP application specified as a DFG on a multi GPU computer cluster. We focus particularly on the effective implementation of communications by automating the computation-communication overlap. After presenting the related work, we show the interest of the implementation of communication-computation overlap on multi-GPU architectures. Then, we present our design flow that allows an efficient implementation of an algorithm expressed as DFG on a multi-GPU architecture. Finally, it is applied on a real world application of 3D granulometry developed for research on materials.
Complete list of metadatas

Cited literature [10 references]  Display  Hide  Download

Contributor : Vincent Boulos <>
Submitted on : Friday, January 6, 2012 - 6:39:48 PM
Last modification on : Wednesday, February 20, 2019 - 12:40:03 PM
Long-term archiving on : Saturday, April 7, 2012 - 3:10:46 AM


Files produced by the author(s)


  • HAL Id : hal-00657536, version 1


Sylvain Huet, Vincent Boulos, Vincent Fristot, Luc Salvo. DFG implementation on multi GPU cluster with computation-communication overlap. Conference on Design and Architectures for Signal and Image Processing (DASIP 2011), Nov 2011, Tampere, Finland. pp.1-8. ⟨hal-00657536⟩



Record views


Files downloads