Modeling Big Data Processing Programs

João Batista de Souza Neto; Anamaria Martins Moreira; Genoveva Vargas-Solar; Martin A Musicante

Communication Dans Un Congrès Année : 2020

Modeling Big Data Processing Programs

, , (1) ,

João Batista de Souza Neto

Fonction : Auteur
PersonId : 803601
ORCID : 0000-0002-8142-2525

Anamaria Martins Moreira

Fonction : Auteur
PersonId : 755820
ORCID : 0000-0002-7707-8469

Genoveva Vargas-Solar

Fonction : Auteur
PersonId : 7250
IdHAL : genoveva-vargas-solar
ORCID : 0000-0001-9545-1821
IdRef : 113038569

Base de Données

Martin A Musicante

Fonction : Auteur

Résumé

We propose a new model for data processing programs. Our model generalizes the data flow programming style implemented by systems such as Apache Spark, DryadLINQ, Apache Beam and Apache Flink. The model uses directed acyclic graphs (DAGs) to represent the main aspects of data flow-based systems, namely Operations over data (filtering, aggregation, join) and Program execution defined by data dependence between operations. We use Monoid Algebra to model operations over distributed, partitioned datasets and Petri Nets to represent the data/control flow. This allows the specification of a data processing program to be agnostic of the target Big Data processing system. Our model has been used to design mutation test operators for big data processing programs. These operators have been implemented by the testing environment TRANSMUT-Spark.

Mots clés

Petri Nets Monoid Algebra Big Data processing Data flow programming models

Domaines

Base de données [cs.DB]

Fichier principal

Modeling_Big_Data_Processing_Programs___SBMF2020.pdf (533.39 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Genoveva Vargas-Solar : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03039212

Soumis le : jeudi 3 décembre 2020-18:20:02

Dernière modification le : mercredi 5 juillet 2023-15:28:04

Archivage à long terme le : jeudi 4 mars 2021-19:52:00

Dates et versions

hal-03039212 , version 1 (03-12-2020)

Identifiants

HAL Id : hal-03039212 , version 1

Citer

João Batista de Souza Neto, Anamaria Martins Moreira, Genoveva Vargas-Solar, Martin A Musicante. Modeling Big Data Processing Programs. 23RD BRAZILIAN SYMPOSIUM ON FORMAL METHODS, Nov 2020, Ouro Preto, Brazil. ⟨hal-03039212⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS UNIV-LYON1 UNIV-LYON2 INSA-LYON EC-LYON LIRIS INSA-GROUPE UDL

48 Consultations

190 Téléchargements

Modeling Big Data Processing Programs

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager