Collecting and Characterizing Distributed Machine Learning Workloads - Archive ouverte HAL Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2020

Collecting and Characterizing Distributed Machine Learning Workloads

Résumé

Machine learning is a key for transforming data into actionable knowledge. The rapid increase in the amount of analyzed data forced the switch to distributed ML platforms. However, the complexity of such platforms is overwhelming for uninitiated users, who may not understand the trade-offs and the challenges of parameterizing such systems to achieve good performance. In order to better analyze and understand ML workloads running on ML distributed platforms, we conducted extensive experiments with various ML methods and real-world datasets, and collected the execution traces of these distributed ML workloads, that represent a total of 12 GB of traces and tens of millions of data records. We then provide a statistical analysis of the collected traces, and illustrate through a use case how different ML workloads' are characterized and their needs identified.
Fichier principal
Vignette du fichier
Collecting_and_Characterizing_Distributed_Machine_Learning_Workloads (3).pdf (320.7 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03343275 , version 1 (15-09-2021)

Identifiants

  • HAL Id : hal-03343275 , version 1

Citer

Yasmine Djebrouni, Sara Bouchenak, Khalid Benabdeslem. Collecting and Characterizing Distributed Machine Learning Workloads. 2021. ⟨hal-03343275⟩
72 Consultations
127 Téléchargements

Partager

Gmail Facebook X LinkedIn More