Computing on large-scale distributed systems: XtremWeb architecture, programming models, security, tests and convergence with grid - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Future Generation Computer Systems Année : 2005

Computing on large-scale distributed systems: XtremWeb architecture, programming models, security, tests and convergence with grid

F. Cappello
  • Fonction : Auteur
S. Djilali
  • Fonction : Auteur
Gilles Fedak
  • Fonction : Auteur
T. Herault
  • Fonction : Auteur
F. Magniette
  • Fonction : Auteur

Résumé

Global Computing systems belong to the class of large-scale distributed systems. Their properties high computational, storage and communication performance potentials, high resilience make them attractive in academia and industry as computing infrastructures in complement to more classical infrastructures such as clusters or supercomputers. However, generalizing the use of these systems in a multi-user and multi-parallel programming context involves finding solutions and providing mechanisms for many issues such as programming bag of tasks and message passing parallel applications, securing the application, the system itself and the computing nodes, deploying the systems for harnessing resources managed in different ways. In this paper, we present our research, often influenced by user demands, towards a Computational peer-to-peer system called XtremWeb.We describe (a) the architecture of the system and its motivations, (b) the parallel programming paradigms available in XtremWeb and how they are implemented, (c) the deployment issues and what mechanisms are used to harness simultaneously uncoordinated set of resources, and resources managed by batch schedulers and (d) the security issue and how we address, inside XtremWeb, the protection of the computing resources. We present two multi-parametric applications to be used in production: Aires belonging to the high energy physics (HEP) Auger project and a protein conformation predictor using a molecular dynamic simulator. To evaluate the performance and volatility tolerance, we present experiment results for bag of tasks applications and message passing applications. We show that the system can tolerate massive failure and we discuss the performance of the node protection mechanism. Based on the XtremWeb project developments and evolutions, we will discuss the convergence between Global Computing systems and Grid.

Dates et versions

in2p3-00163490 , version 1 (17-07-2007)

Identifiants

Citer

F. Cappello, S. Djilali, Gilles Fedak, T. Herault, F. Magniette, et al.. Computing on large-scale distributed systems: XtremWeb architecture, programming models, security, tests and convergence with grid. Future Generation Computer Systems, 2005, 21, pp.417-437. ⟨10.1016/j.future.2004.04.011⟩. ⟨in2p3-00163490⟩
15 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More