JEL: unified resource tracking for parallel and distributed applications - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Concurrency and Computation: Practice and Experience Année : 2010

JEL: unified resource tracking for parallel and distributed applications

Résumé

When parallel applications are run in large-scale distributed environments, such as grids, peer-to-peer (P2P) systems, and clouds, the set of resources used can change dynamically as machines crash, reservations end, and new resources become available. It is vital for applications to respond to these changes. Therefore, it is necessary to keep track of the available resources--a problem which is known to be notoriously difficult. In this article we argue that resource tracking must be provided as the standard functionality in the lower parts of the software stack. We propose a general solution to resource tracking: the Join-Elect-Leave (JEL) model. JEL provides unified resource tracking for parallel and distributed applications across environments. JEL is a simple yet powerful model based on notifying when resources have Joined or Left the computation. We demonstrate that JEL is suitable for resource tracking in a wide variety of programming models, ranging from the fixed resource sets traditionally used in MPI-1 to flexible grid-oriented programming models. We compare several JEL implementations, and show these to perform and scale well in several real-world scenarios involving grids, clouds and P2P systems applied concurrently, and wide-area systems with failing resources. Using JEL, we have won the first prize in a number of international distributed computing competitions. Copyright © 2010 John Wiley & Sons, Ltd.

Mots clés

Fichier principal
Vignette du fichier
PEER_stage2_10.1002%2Fcpe.1592.pdf (746.84 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00686074 , version 1 (07-04-2012)

Identifiants

Citer

Niels Drost. JEL: unified resource tracking for parallel and distributed applications. Concurrency and Computation: Practice and Experience, 2010, 23 (1), pp.17. ⟨10.1002/cpe.1592⟩. ⟨hal-00686074⟩

Collections

PEER
44 Consultations
100 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More