Collecting and Characterizing Distributed Machine Learning Workloads
Résumé
Machine learning is a key for transforming data into actionable knowledge. The rapid increase in the amount of analyzed data forced the switch to distributed ML platforms. However, the complexity of such platforms is overwhelming for uninitiated users, who may not understand the trade-offs and the challenges of parameterizing such systems to achieve good performance. In order to better analyze and understand ML workloads running on ML distributed platforms, we conducted extensive experiments with various ML methods and real-world datasets, and collected the execution traces of these distributed ML workloads, that represent a total of 12 GB of traces and tens of millions of data records. We then provide a statistical analysis of the collected traces, and illustrate through a use case how different ML workloads' are characterized and their needs identified.
Fichier principal
Collecting_and_Characterizing_Distributed_Machine_Learning_Workloads (3).pdf (320.7 Ko)
Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)