Scalable Distributed Virtual Data Structures
Résumé
The MapReduce framework allows for distributed/parallel scans followed by aggregations of data in large stored files. We propose here scalable distributed structures conveying the MapReduce framework to virtual files. Virtual files are not stored, but materialize dynamically in cloud nodes bysome Big Computation. A major constraint on a virtual file is a client-defined reasonable bound on processing time at each node, e.g., 10 min. We define two schemes for virtual files called VH* and VR*. They provide scalable distributed hash and range partitioning respectively and respect a clientimposed time limit. We show their usefulness by applying them to the problem of key recovery and of solving the knapsack problem.