PAXQuery: Parallel Analytical XML Processing
Résumé
XQuery is a general-purpose programming language for processing semi-structured data, and as such, it is very expressive. As a consequence , optimizing and parallelizing complex analytics XQuery queries is still an open, challenging problem. We demonstrate PAXQuery, a novel system that parallelizes the execution of XQuery queries over large collections of XML documents. PAXQuery compiles a rich subset of XQuery into plans expressed in the PArallelization ConTracts (PACT) programming model. Thanks to this translation, the resulting plans are optimized and executed in a massively parallel fashion by the Apache Flink system. The result is a scalable system capable of querying massive amounts of XML data very efficiently, as proved by the experimental results we outline.
Domaines
Base de données [cs.DB]
Origine : Fichiers éditeurs autorisés sur une archive ouverte
Loading...