On the Evaluation of RDF Distribution Algorithms Implemented over Apache Spark

Olivier Curé; Hubert Naacke; Mohamed-Amine Baazizi; Bernd Amann

doi:10.48550/arXiv.1507.02321

Communication Dans Un Congrès Année : 2015

On the Evaluation of RDF Distribution Algorithms Implemented over Apache Spark

(1) , (2) , (2) , (2)

1
2

Olivier Curé

Fonction : Auteur
PersonId : 18350
IdHAL : olivier-cure
IdRef : 153626011

Laboratoire d'Informatique Gaspard-Monge

Hubert Naacke

Fonction : Auteur
PersonId : 9627
IdHAL : hubert-naacke
ORCID : 0000-0003-0559-9908
IdRef : 06104203X

Bases de Données

Mohamed-Amine Baazizi

Fonction : Auteur
PersonId : 13062
IdHAL : mohamed-amine-baazizi
IdRef : 162548923

Bases de Données

Bernd Amann

Fonction : Auteur
PersonId : 3057
IdHAL : bernd-amann
ORCID : 0000-0002-6822-4049
IdRef : 060259418

Bases de Données

Résumé

Querying very large RDF data sets in an efficient and scalable manner requires parallel query plans combined with appropriate data distribution strategies. Several innovative solutions have recently been proposed for optimizing data distribution with or without predefined query workloads. This paper presents an in-depth analysis and experimental comparison of five representative RDF data distribution approaches. For achieving fair experimental results, we are using Apache Spark as a common parallel computing framework by rewriting the concerned algorithms using the Spark API. Spark provides guarantees in terms of fault tolerance, high availability and scalability which are essential in such systems. Our different implementations aim to highlight the fundamental implementation-independent characteristics of each approach in terms of data preparation, load balancing, data replication and to some extent to query answering cost and performance. The presented measures are obtained by testing each system on one synthetic and one real-world data set over query workloads with differing characteristics and different partitioning constraints.

Domaines

Informatique [cs] Base de données [cs.DB] Calcul parallèle, distribué et partagé [cs.DC]

Lip6 Publications : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01214902

Soumis le : mardi 13 octobre 2015-11:58:12

Dernière modification le : jeudi 4 avril 2024-03:28:34

Dates et versions

hal-01214902 , version 1 (13-10-2015)

Identifiants

HAL Id : hal-01214902 , version 1
ARXIV : 1507.02321
DOI : 10.48550/arXiv.1507.02321

Citer

Olivier Curé, Hubert Naacke, Mohamed-Amine Baazizi, Bernd Amann. On the Evaluation of RDF Distribution Algorithms Implemented over Apache Spark. The 11th International Workshop on Scalable Semantic Web Knowledge Base Systems, Oct 2015, Bethlehem, Pennsylvania, United States. pp.16-31, ⟨10.48550/arXiv.1507.02321⟩. ⟨hal-01214902⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENPC UPMC CNRS PARISTECH LIGM LIGM_MOA LIP6 ESIEE-PARIS SORBONNE-UNIVERSITE SU-SCIENCES UNIV-EIFFEL LIGM_BAAM JSE2024

281 Consultations

0 Téléchargements

On the Evaluation of RDF Distribution Algorithms Implemented over Apache Spark

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager