Skip to Main content Skip to Navigation
Conference papers

Modeling Big Data Processing Programs

João Batista de Souza Neto Anamaria Martins Moreira Genoveva Vargas-Solar 1 Martin Musicante
1 BD - Base de Données
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information
Abstract : We propose a new model for data processing programs. Our model generalizes the data flow programming style implemented by systems such as Apache Spark, DryadLINQ, Apache Beam and Apache Flink. The model uses directed acyclic graphs (DAGs) to represent the main aspects of data flow-based systems, namely Operations over data (filtering, aggregation, join) and Program execution defined by data dependence between operations. We use Monoid Algebra to model operations over distributed, partitioned datasets and Petri Nets to represent the data/control flow. This allows the specification of a data processing program to be agnostic of the target Big Data processing system. Our model has been used to design mutation test operators for big data processing programs. These operators have been implemented by the testing environment TRANSMUT-Spark.
Document type :
Conference papers
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03039212
Contributor : Genoveva Vargas-Solar Connect in order to contact the contributor
Submitted on : Thursday, December 3, 2020 - 6:20:02 PM
Last modification on : Tuesday, June 1, 2021 - 2:08:08 PM
Long-term archiving on: : Thursday, March 4, 2021 - 7:52:00 PM

File

Modeling_Big_Data_Processing_P...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-03039212, version 1

Citation

João Batista de Souza Neto, Anamaria Martins Moreira, Genoveva Vargas-Solar, Martin Musicante. Modeling Big Data Processing Programs. 23RD BRAZILIAN SYMPOSIUM ON FORMAL METHODS, Nov 2020, Ouro Preto, Brazil. ⟨hal-03039212⟩

Share

Metrics

Record views

46

Files downloads

166