Skip to Main content Skip to Navigation
New interface
Conference papers

Data Exploration with SQL using Machine Learning Techniques

Abstract : Nowadays data scientists have access to gigantic data, many of them being accessible through SQL. Despite the inherent simplicity of SQL, writing relevant and efficient SQL queries is known to be difficult, especially for databases having a large number of attributes or meaningless attribute names. In this paper, we propose a " rewriting " technique to help data scientists formulate SQL queries, to rapidly and intuitively explore their big data, while keeping user input at a minimum, with no manual tuple specification or labeling. For a user specified query, we define a negation query, which produces tuples that are not wanted in the initial query's answer. Since there is an exponential number of such negation queries, we describe a pseudo-polynomial heuristic to pick the negation closest in size to the initial query, and construct a balanced learning set whose positive examples correspond to the results desired by analysts, and negative examples to those they do not want. The initial query is reformulated using machine learning techniques and a new query, more efficient and diverse, is obtained. We have implemented a prototype and conducted experiments on real-life datasets and synthetic query workloads to assess the scalability and precision of our proposition. A preliminary qualitative experiment conducted with astrophysicists is also described.
Complete list of metadata

Cited literature [29 references]  Display  Hide  Download
Contributor : Vasile-Marian Scuturici Connect in order to contact the contributor
Submitted on : Friday, February 3, 2017 - 5:29:24 PM
Last modification on : Friday, September 30, 2022 - 11:34:16 AM
Long-term archiving on: : Friday, May 5, 2017 - 1:50:48 PM


Files produced by the author(s)


  • HAL Id : hal-01455715, version 1


Julien Cumin, Jean-Marc Petit, Vasile-Marian Scuturici, Sabina Surdu. Data Exploration with SQL using Machine Learning Techniques. International Conference on Extending Database Technology - EDBT, Mar 2017, Venice, Italy. pp.96-107. ⟨hal-01455715⟩



Record views


Files downloads