Skip to Main content Skip to Navigation
Conference papers

Safely Managing Data Variety in Big Data Software Development

Abstract : We consider the task of building Big Data software systems, offered as software-as-a-service. These applications are commonly backed by NoSQL data stores that address the proverbial Vs of Big Data processing: NoSQL data stores can handle large volumes of data and many systems do not enforce a global schema, to account for structural variety in data. Thus, software engineers can design the data model on the go, a flexibility that is particularly crucial in agile software development. However, NoSQL data stores commonly do not yet account for the veracity of changes when it comes to changes in the structure of persisted data. Yet this is an inevitable consequence of agile software development. In most NoSQL-based application stacks, schema evolution is completely handled within the application code, usually involving object mapper libraries. Yet simple code refactorings, such as renaming a class attribute at the source code level, can cause data loss or runtime errors once the application has been deployed to production. We address this pain point by contributing type checking rules that we have implemented within an IDE plugin. Our plugin ControVol statically type checks the object mapper class declarations against the code release history. ControVol is thus capable of detecting common yet risky cases of mismatched data and schema, and can even suggest automatic fixes.
Document type :
Conference papers
Complete list of metadata

Cited literature [17 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01207655
Contributor : Thomas Cerqueus <>
Submitted on : Thursday, October 1, 2015 - 10:24:26 AM
Last modification on : Wednesday, July 8, 2020 - 12:43:32 PM
Long-term archiving on: : Saturday, January 2, 2016 - 10:52:31 AM

File

icse.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01207655, version 1

Citation

Thomas Cerqueus, Eduardo Cunha de Almeida, Stefanie Scherzinger. Safely Managing Data Variety in Big Data Software Development. 1st IEEE/ACM International Workshop on Big Data Software Engineering, May 2015, Florence, Italy. ⟨hal-01207655⟩

Share

Metrics

Record views

980

Files downloads

331