Skip to Main content Skip to Navigation
Poster communications

Heterogeneity-aware Deep Learning Workload Deployments on the Computing Continuum

Thomas Bouvier 1 Alexandru Costan 1 Gabriel Antoniu 1
1 KerData - Scalable Storage for Clouds and Beyond
Inria Rennes – Bretagne Atlantique , IRISA-D1 - SYSTÈMES LARGE ÉCHELLE
Abstract : The increasing need for real-time analytics motivated the emergence of new incremental methods to learn representations from continuous flows of data, especially in the context of the Internet of Things. This trend led to the evolution of centralized computing infrastructures towards interconnected processing units spanning from edge devices to cloud data centers. This new paradigm is referred to as the Computing or Edge-to-Cloud Continuum. However, the network and compute heterogeneity across and within clusters may negatively impact Deep Learning (DL) training. We introduce a roadmap for understanding the end-to-end performance of DL workloads in such heterogeneous settings. The goal is to identify key parameters leading to stragglers and devise novel intra- and inter-cluster strategies to address them. We will explore various policies aiming to improve makespan, cost and fairness objectives while ensuring system scalability.
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03270129
Contributor : Thomas Bouvier <>
Submitted on : Friday, June 25, 2021 - 12:04:10 PM
Last modification on : Friday, September 10, 2021 - 2:09:24 PM

Identifiers

  • HAL Id : hal-03270129, version 1

Citation

Thomas Bouvier, Alexandru Costan, Gabriel Antoniu. Heterogeneity-aware Deep Learning Workload Deployments on the Computing Continuum. IPDPS 2021 - 35th IEEE International Parallel & Distributed Processing Symposium, May 2021, Virtual / Portland, United States. ⟨hal-03270129⟩

Share

Metrics

Record views

95

Files downloads

35