Modeling Big Data Processing Programs - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

Modeling Big Data Processing Programs

João Batista de Souza Neto
Anamaria Martins Moreira
Genoveva Vargas-Solar
Martin A Musicante
  • Fonction : Auteur

Résumé

We propose a new model for data processing programs. Our model generalizes the data flow programming style implemented by systems such as Apache Spark, DryadLINQ, Apache Beam and Apache Flink. The model uses directed acyclic graphs (DAGs) to represent the main aspects of data flow-based systems, namely Operations over data (filtering, aggregation, join) and Program execution defined by data dependence between operations. We use Monoid Algebra to model operations over distributed, partitioned datasets and Petri Nets to represent the data/control flow. This allows the specification of a data processing program to be agnostic of the target Big Data processing system. Our model has been used to design mutation test operators for big data processing programs. These operators have been implemented by the testing environment TRANSMUT-Spark.
Fichier principal
Vignette du fichier
Modeling_Big_Data_Processing_Programs___SBMF2020.pdf (533.39 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03039212 , version 1 (03-12-2020)

Identifiants

  • HAL Id : hal-03039212 , version 1

Citer

João Batista de Souza Neto, Anamaria Martins Moreira, Genoveva Vargas-Solar, Martin A Musicante. Modeling Big Data Processing Programs. 23RD BRAZILIAN SYMPOSIUM ON FORMAL METHODS, Nov 2020, Ouro Preto, Brazil. ⟨hal-03039212⟩
48 Consultations
190 Téléchargements

Partager

Gmail Facebook X LinkedIn More