Geographical failover for the EGEE-WLCG Grid collaboration tools

Gilles Mathieu; Cyril L'Orphelin; Osman Aidel; Alessandro Cavalli; Alfredo Pagano; Rafal Lichwala

doi:10.1088/1742-6596/119/6/062022

Communication Dans Un Congrès Journal of Physics: Conference Series Année : 2008

Geographical failover for the EGEE-WLCG Grid collaboration tools

(1) , (1) , (1) , (2) , (2) , (3)

1
2
3

Gilles Mathieu

Fonction : Auteur
PersonId : 927284

Centre de Calcul de l'IN2P3

Cyril L'Orphelin

Fonction : Auteur
PersonId : 927287

Centre de Calcul de l'IN2P3

Osman Aidel

Fonction : Auteur
PersonId : 927288

Centre de Calcul de l'IN2P3

Alessandro Cavalli

Fonction : Auteur
PersonId : 927290

Istituto Nazionale di Fisica Nucleare, Sezione di Bologna

Alfredo Pagano

Fonction : Auteur
PersonId : 927289

Istituto Nazionale di Fisica Nucleare, Sezione di Bologna

Rafal Lichwala

Fonction : Auteur
PersonId : 927291

Poznan Supercomputing and Networking Center

Résumé

Worldwide grid projects such as EGEE and WLCG need services with high availability, not only for grid usage, but also for associated operations. In particular, tools used for daily activities or operational procedures are considered critical. In this context, the goal of the work done to solve the EGEE failover problem is to propose, implement and document well-established mechanisms and procedures to limit service outages for the operations and monitoring tools used by regional and global grid operators to control the status of the EGEE grid. The operations activity of EGEE relies on different tools developed by teams from different countries. For each tool, only one instance was deployed prior to this work, thus representing single points of failure. In our work, we solved the problem by replicating tools in different sites, using specific DNS features to automatically swap a given service instance in case of failures. After a DNS test phase in a virtual machine (vm) environment focused on nsupdate, NS/zone configuration and fast TTLs, a new domain for grid operations (gridops.org) was registered. In addition, replication of databases, web servers and web services have also been investigated and configured. In this paper, we describe the technical mechanism used in our approach. We also show the replication procedure implemented for the EGEE/WLCG CIC Operations Portal use case. Furthermore, we present the interest in failover procedures in the context of other grid projects and grid services. Future plans for improvements of the procedures are also described.

Mots clés

Failover réplication redondance Grille

Domaines

Base de données [cs.DB] Architectures Matérielles [cs.AR] Web

Gilles Mathieu : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00715013

Soumis le : vendredi 6 juillet 2012-10:54:42

Dernière modification le : jeudi 11 avril 2024-13:18:11

Dates et versions

hal-00715013 , version 1 (06-07-2012)

Identifiants

HAL Id : hal-00715013 , version 1
DOI : 10.1088/1742-6596/119/6/062022
INSPIRE : 803937

Citer

Gilles Mathieu, Cyril L'Orphelin, Osman Aidel, Alessandro Cavalli, Alfredo Pagano, et al.. Geographical failover for the EGEE-WLCG Grid collaboration tools. 16th International Conference on Computing in High Energy and Nuclear Physics (CHEP 2007), Sep 2007, Victoria, Canada. pp.062022, ⟨10.1088/1742-6596/119/6/062022⟩. ⟨hal-00715013⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

IN2P3 CNRS CC-IN2P3

148 Consultations

0 Téléchargements

Geographical failover for the EGEE-WLCG Grid collaboration tools

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager