Machine Learning to Data Management: A Round Trip

Laure Berti-Équille 1 Angela Bonifati 2 Tova Milo 3
2 BD - Base de Données
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information
Abstract : With the emergence of machine learning (ML) techniques in database research, ML has already proved a tremendous potential to dramatically impact the foundations, algorithms, and models of several data management tasks, such as error detection, data cleaning, data integration, and query inference. Part of the data preparation, standardization, and cleaning processes, such as data matching and deduplication for instance, could be automated by making a ML model " learn " and predict the matches routinely. Data integration can also benefit from ML as the data to be integrated can be sampled and used to design the data integration algorithms. After the initial manual work to setup the labels, ML models can start learning from the new incoming data that are being submitted for standardization, integration, and cleaning. The more data supplied to the model, the better the ML algorithm can perform and deliver accurate results. Therefore, ML is more scalable compared to traditional and time-consuming approaches. Nevertheless, many ML algorithms require an out-of-the-box tuning and their parameters and scope are often not adapted to the problem at hand. To make an example, in cleaning and integration processes, the window sizes of values used for the ML models cannot be arbitrarily chosen and require an adaptation of the learning parameters. This tutorial will survey the recent trend of applying machine learning solutions to improve data management tasks and establish new paradigms to sharpen data error detection, cleaning, and integration at the data instance level, as well as at schema, system, and user levels.
Complete list of metadatas

Cited literature [43 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01795315
Contributor : Laure Berti-Equille <>
Submitted on : Friday, May 18, 2018 - 12:34:10 PM
Last modification on : Thursday, February 7, 2019 - 3:26:19 PM
Long-term archiving on : Monday, September 24, 2018 - 1:12:30 PM

File

PID5217775.pdf
Files produced by the author(s)

Identifiers

Citation

Laure Berti-Équille, Angela Bonifati, Tova Milo. Machine Learning to Data Management: A Round Trip. Proceedings of the 34th IEEE International Conference on Data Engineering (ICDE), Apr 2018, Paris, France. pp.1735-1738, ⟨10.1109/ICDE.2018.00226⟩. ⟨hal-01795315⟩

Share

Metrics

Record views

178

Files downloads

515