Parallel Mining of Dependencies

Abstract : The problem of extracting functional dependencies (FDs) from databases has a long story dating back to the 90's. Still, efficient solutions taking into account both material evolution, namely the advent of multicore machines, and the amount of data that are to be mined, are still needed. In this paper we propose a parallel algorithm which, upon small modifications, extracts (i) the minimal keys, (ii) the minimal exact FDs, (iii) the minimal approximate FDs and (iv) the Conditional functional dependencies (CFDs) holding in a table. Under some natural conditions, we prove a theoretical speed up of our solution with respect to a baseline algorithm which follows a depth first search strategy. Since mining most of these dependencies require a procedure for computing the {\em number of distinct values} (NDV) which is a space consuming operation, we show how sketching techniques for estimating the exact value of NDV can be used for reducing both memory consumption as well as communications overhead when considering distributed data while guaranteeing a certain quality of the result. Our solution is implemented and some experimental results are reported here showing the efficiency and scalability of our proposal. Most notably, the theoretical speed ups are confirmed by the experiments.
Document type :
Conference papers
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01010968
Contributor : Sofian Maabout <>
Submitted on : Saturday, June 21, 2014 - 12:20:16 PM
Last modification on : Friday, March 9, 2018 - 11:24:56 AM

Identifiers

  • HAL Id : hal-01010968, version 1

Citation

Eve Garnaud, Nicolas Hanusse, Sofian Maabout, Noël Novelli. Parallel Mining of Dependencies. The 2014 International Conference on High Performance Computing & Simulation (HPCS 2014, Jul 2014, Bologne, Italy. pp.1-8. ⟨hal-01010968⟩

Share

Metrics

Record views

227