Effective and Efficient Similarity Search in Scientific Workflow Repositories

Johannes Starlinger 1 Sarah Cohen-Boulakia 2, 3, 4, 5 Sanjeev Khanna 6 Susan Davidson 6 Ulf Leser 1
3 ZENITH - Scientific Data Management
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
4 VIRTUAL PLANTS - Modeling plant morphogenesis at different scales, from genes to phenotype
CRISAM - Inria Sophia Antipolis - Méditerranée , INRA - Institut National de la Recherche Agronomique, Centre de coopération internationale en recherche agronomique pour le développement [CIRAD] : UMR51
Abstract : Scientific workflows have become a valuable tool for large-scale data processing and analysis. This has led to the creation of specialized online repositories to facilitate worflkow sharing and reuse. Over time, these repositories have grown to sizes that call for advanced methods to support workflow discovery, in particular for similarity search. Effective similarity search requires both high quality algorithms for the comparison of scientific workflows and efficient strategies for indexing, searching, and ranking of search results. Yet, the graph structure of scientific workflows poses severe challenges to each of these steps. Here, we present a complete system for effective and efficient similarity search in scientific workflow repositories, based on the Layer Decompositon approach to scientific workflow comparison. Layer Decompositon specifically accounts for the directed dataflow underlying scientific workflows and, compared to other state-of-the-art methods, delivers best results for similarity search at comparably low runtimes. Stacking Layer Decomposition with even faster, structure-agnostic approaches allows us to use proven, off-the-shelf tools for workflow indexing to further reduce runtimes and scale similarity search to sizes of current repositories.
Type de document :
Article dans une revue
Future Generation Computer Systems, Elsevier, 2016, 56, pp.584-594. 〈10.1016/j.future.2015.06.012〉
Liste complète des métadonnées

Littérature citée [21 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01170597
Contributeur : Sarah Cohen-Boulakia <>
Soumis le : mardi 1 décembre 2015 - 10:53:27
Dernière modification le : vendredi 9 juin 2017 - 10:42:44
Document(s) archivé(s) le : mercredi 2 mars 2016 - 11:51:27

Fichier

StarlingerCohen-BoulakiaEtAl.p...
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Johannes Starlinger, Sarah Cohen-Boulakia, Sanjeev Khanna, Susan Davidson, Ulf Leser. Effective and Efficient Similarity Search in Scientific Workflow Repositories . Future Generation Computer Systems, Elsevier, 2016, 56, pp.584-594. 〈10.1016/j.future.2015.06.012〉. 〈hal-01170597〉

Partager

Métriques

Consultations de
la notice

558

Téléchargements du document

296