Skip to Main content Skip to Navigation
New interface
Conference papers

Modeling Big Data Processing Programs

João Batista de Souza Neto Anamaria Martins Moreira Genoveva Vargas-Solar 1 Martin A Musicante 
1 BD - Base de Données
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information
Abstract : We propose a new model for data processing programs. Our model generalizes the data flow programming style implemented by systems such as Apache Spark, DryadLINQ, Apache Beam and Apache Flink. The model uses directed acyclic graphs (DAGs) to represent the main aspects of data flow-based systems, namely Operations over data (filtering, aggregation, join) and Program execution defined by data dependence between operations. We use Monoid Algebra to model operations over distributed, partitioned datasets and Petri Nets to represent the data/control flow. This allows the specification of a data processing program to be agnostic of the target Big Data processing system. Our model has been used to design mutation test operators for big data processing programs. These operators have been implemented by the testing environment TRANSMUT-Spark.
Document type :
Conference papers
Complete list of metadata
Contributor : Genoveva Vargas-Solar Connect in order to contact the contributor
Submitted on : Thursday, December 3, 2020 - 6:20:02 PM
Last modification on : Friday, September 30, 2022 - 11:34:16 AM
Long-term archiving on: : Thursday, March 4, 2021 - 7:52:00 PM


Files produced by the author(s)


  • HAL Id : hal-03039212, version 1


João Batista de Souza Neto, Anamaria Martins Moreira, Genoveva Vargas-Solar, Martin A Musicante. Modeling Big Data Processing Programs. 23RD BRAZILIAN SYMPOSIUM ON FORMAL METHODS, Nov 2020, Ouro Preto, Brazil. ⟨hal-03039212⟩



Record views


Files downloads