PAXQuery: Parallel Analytical XML Processing

Jesús Camacho-Rodríguez 1 Dario Colazzo 2 Ioana Manolescu 3, 4 Juan A. M. Naranjo 3, 4
3 OAK - Database optimizations and architectures for complex large data
CNRS - Centre National de la Recherche Scientifique : UMR8623, Inria Saclay - Ile de France, UP11 - Université Paris-Sud - Paris 11, LRI - Laboratoire de Recherche en Informatique
Abstract : XQuery is a general-purpose programming language for processing semi-structured data, and as such, it is very expressive. As a consequence , optimizing and parallelizing complex analytics XQuery queries is still an open, challenging problem. We demonstrate PAXQuery, a novel system that parallelizes the execution of XQuery queries over large collections of XML documents. PAXQuery compiles a rich subset of XQuery into plans expressed in the PArallelization ConTracts (PACT) programming model. Thanks to this translation, the resulting plans are optimized and executed in a massively parallel fashion by the Apache Flink system. The result is a scalable system capable of querying massive amounts of XML data very efficiently, as proved by the experimental results we outline.
Document type :
Conference papers
Complete list of metadatas

Cited literature [17 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01178490
Contributor : Jesús Camacho-Rodríguez <>
Submitted on : Monday, July 20, 2015 - 11:27:55 AM
Last modification on : Monday, May 28, 2018 - 2:38:02 PM
Long-term archiving on : Wednesday, October 21, 2015 - 5:12:35 PM

File

PAXQuery-SIGMOD2015.pdf
Publisher files allowed on an open archive

Identifiers

Citation

Jesús Camacho-Rodríguez, Dario Colazzo, Ioana Manolescu, Juan A. M. Naranjo. PAXQuery: Parallel Analytical XML Processing. ACM SIGMOD International Conference on Management of Data 2015, May 2015, Melbourne, Victoria, Australia. pp.1117-1122, ⟨10.1145/2723372.2735374⟩. ⟨hal-01178490⟩

Share

Metrics

Record views

727

Files downloads

278