Parallel Mining of Dependencies - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2014

Parallel Mining of Dependencies

Résumé

The problem of extracting functional dependencies (FDs) from databases has a long story dating back to the 90's. Still, efficient solutions taking into account both material evolution, namely the advent of multicore machines, and the amount of data that are to be mined, are still needed. In this paper we propose a parallel algorithm which, upon small modifications, extracts (i) the minimal keys, (ii) the minimal exact FDs, (iii) the minimal approximate FDs and (iv) the Conditional functional dependencies (CFDs) holding in a table. Under some natural conditions, we prove a theoretical speed up of our solution with respect to a baseline algorithm which follows a depth first search strategy. Since mining most of these dependencies require a procedure for computing the {\em number of distinct values} (NDV) which is a space consuming operation, we show how sketching techniques for estimating the exact value of NDV can be used for reducing both memory consumption as well as communications overhead when considering distributed data while guaranteeing a certain quality of the result. Our solution is implemented and some experimental results are reported here showing the efficiency and scalability of our proposal. Most notably, the theoretical speed ups are confirmed by the experiments.
Fichier non déposé

Dates et versions

hal-01010968 , version 1 (21-06-2014)

Identifiants

  • HAL Id : hal-01010968 , version 1

Citer

Eve Garnaud, Nicolas Hanusse, Sofian Maabout, Noël Novelli. Parallel Mining of Dependencies. The 2014 International Conference on High Performance Computing & Simulation (HPCS 2014, Jul 2014, Bologne, Italy. pp.1-8. ⟨hal-01010968⟩
171 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More