Efficient implementation of data flow graphs on multi-gpu clusters

Abstract : Nowadays, it is possible to build a multi-GPU supercomputer, well suited for implementation of digital signal processing algorithms, for a few thousand dollars. However, to achieve the highest performance with this kind of architecture, the programmer has to focus on inter-processor communications, tasks synchronization. In this paper, we propose a high level programming model based on a data flow graph (DFG) allowing an efficient implementation of digital signal processing applications on a multi-GPU computer cluster. This DFG-based design flow abstracts the underlying architecture. We focus particularly on the efficient implementation of communications by automating computation-communication overlap, which can lead to significant speedups as shown in the presented benchmark. The approach is validated on three experiments: a multi-host multi-gpu benchmark, a 3D granulometry application developed for research on materials and an application for computing visual saliency maps.
Complete list of metadatas

Cited literature [27 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-00746981
Contributor : Vincent Boulos <>
Submitted on : Tuesday, October 30, 2012 - 11:26:51 AM
Last modification on : Wednesday, February 20, 2019 - 12:40:05 PM
Long-term archiving on : Thursday, January 31, 2013 - 3:46:03 AM

File

main.pdf
Files produced by the author(s)

Identifiers

Citation

Vincent Boulos, Sylvain Huet, Vincent Fristot, Luc Salvo, Dominique Houzet. Efficient implementation of data flow graphs on multi-gpu clusters. Journal of Real-Time Image Processing, Springer Verlag, 2012, ⟨10.1007/s11554-012-0279-0⟩. ⟨hal-00746981⟩

Share

Metrics

Record views

468

Files downloads

658