Skip to Main content Skip to Navigation
Conference papers

Implementations Impact on Iterative Image Processing for Embedded GPU

Abstract : The emergence of low-power embedded Graphical Processing Units (GPUs) with high computation capabilities has enabled the integration of image processing chains in a wide variety of embedded systems. Various optimisation techniques are however needed in order to get the most out of an embedded GPU. This paper explores several optimisation methods for iterative stencil-like image processing algorithms on embedded NVIDIA GPUs using the Compute Unified Device Architecture (CUDA) API. We chose to focus our architectural optimisations on the TV-L1 algorithm, an optical flow estimation method based on total variation (TV) regularisation and the L1 norm. It is widely used as a model for more complex optical flow estimations and is used in many recent video processing applications. In this work we evaluate the impact of architecture-oriented optimisations on both execution time and energy consumption on several Nvidia Jetson GPU embedded boards. Results show a speedup up to 3× compared to State-of-the-Art versions as well as a 2.6× decrease in energy consumption.
Complete list of metadata
Contributor : Lionel Lacassagne Connect in order to contact the contributor
Submitted on : Wednesday, September 1, 2021 - 11:25:26 AM
Last modification on : Monday, December 6, 2021 - 5:12:03 PM
Long-term archiving on: : Thursday, December 2, 2021 - 6:51:14 PM


Files produced by the author(s)


  • HAL Id : hal-03330779, version 1


Thomas Romera, Andrea Petreto, Florian Lemaitre, Manuel Bouyer, Quentin Meunier, et al.. Implementations Impact on Iterative Image Processing for Embedded GPU. European Signal Processing Conference (EUSIPCO), Aug 2021, Dublin, Ireland. ⟨hal-03330779⟩



Record views


Files downloads