Leveraging naturally distributed data redundancy to reduce collective I/O replication overhead - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2015

Leveraging naturally distributed data redundancy to reduce collective I/O replication overhead

Bogdan Nicolae
Connectez-vous pour contacter l'auteur

Résumé

Dumping large amounts of related data simulta-neously to local storage devices instead of a parallel file system is a frequent I/O pattern of HPC applications running at large scale. Since local storage resources are prone to failures and have limited potential to serve multiple requests in parallel, techniques such as replication are often used to enable re-silience and high availability. However, replication introduces overhead, both in terms of network traffic necessary to distribute replicas, as well as extra storage space requirements. To reduce this overhead, state-of-art techniques often apply redundancy elimination (e.g. compression or deduplication) before replication, ignoring the natural redundancy that is already present. By contrast, this paper proposes a novel scheme that treats redundancy elimination and replication as a single co-optimized phase: remotely duplicated data is detected and directly leveraged to maintain a desired replication factor by keeping only as many replicas as needed and adding more if necessary. In this context, we introduce a series of high performance algorithms specifically designed to operate under tight and controllable constrains at large scale. We present how this idea can be leveraged in practice and demonstrate its viability for two real-life HPC applications.
Fichier principal
Vignette du fichier
paper.pdf (245.8 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01115700 , version 1 (11-02-2015)

Identifiants

Citer

Bogdan Nicolae. Leveraging naturally distributed data redundancy to reduce collective I/O replication overhead. IPDPS '15: 29th IEEE International Parallel and Distributed Processing Symposium, May 2015, Hyderabad, India. ⟨10.1109/IPDPS.2015.82⟩. ⟨hal-01115700⟩
120 Consultations
207 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More