A. Agrawal, L. Benoit, Y. Magnan, and . Robert, Scheduling algorithms for linear workflow optimization Mapping linear workflows with computation/communication overlap, IPDPS'2010, the 24th IEEE International Parallel and Distributed Processing Symposium ICPADS'2008, the 14th IEEE International Conference on Parallel and Distributed Systems, 2008.

R. B. Allan, R. M. Jones, S. J. Lee, Y. Ahmad, M. Gene et al., Software pipelining On exploiting task duplication in parallel program scheduling Validity of the single processor approach to achieving large scale computing capabilities Scheduling problems with two competing agents Denis Trystram . Fault tolerance and availability awareness in computational grids Fundamentals of Grid Computing, Numerical Analysis and Scientific Computing, LogGP: Incorporating Long Messages into the LogP Model SPAA '95: Proceedings of the seventh annual symposium on Parallelism in algorithms and architecturesBBG + 09 ] Xavier Besseron The GrADS Project: Software Support for High- Level Grid Application Development. International Journal of High Performance Computing ApplicationsBen09 ] Anne Benoit. Scheduling pipelined applications: models, algorithms and complexity, pp.367-432872, 1967.

D. Michael, . Beynon-anne, B. Benoit, M. Gaujal, Y. Gallet et al., Computing the throughput of replicated workflows on heterogeneous platforms Macro pipelining based scheduling on high performance heterogeneous multiprocessor systems Implementation of a portable nested data-parallel language, Supporting Data Intensive Applications in a Heterogeneous Environment ICPP'2009, the 38th International Conference on Parallel Processing, pp.1468-14844, 1994.

J. S. Sussman, T. Bansal, K. Kimbrel, H. Benoit, V. Kosch et al., Speed scaling to manage energy and temperature Multicriteria Scheduling of Pipeline Workflows (and Application to the JPEG Encoder) Assessing the impact and limits of steady-state scheduling for mixed task and data parallelism on heterogeneous platforms, Distributed processing of very large datasets with DataCutter. Parallel Computing ISPDC'04, the 3rd International Symposium on Parallel and Distributed Computing/3rd International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks, pp.1457-14781, 2001.

B. Susan, . Davidsonbok88, H. Shahid, A. Bokhari, Y. Benoit et al., A model for user-oriented data provenance in pipelined scientific workflows Partitioning problems in parallel, pipeline, and distributed computing Mapping pipeline skeletons onto heterogeneous platforms, Provenance and Annotation of Data, International Provenance and Annotation Workshop (IPAW)BR09 ] Anne Benoit and Yves Robert. Multi-criteria mapping techniques for pipeline workflows on heterogeneous platforms, pp.133-14748, 1988.

D. A. Arabnia, A. Power, Y. R. Benoit, A. Benoit, P. Renaud-goud et al., Performance and energy optimization of concurrent pipelined applications Efficient collective communication in distributed heterogeneous systems Multi-criteria scheduling of pipeline workflows Optimizing latency and reliability of pipeline workflow applications On the complexity of mapping linear chain applications onto heterogeneous platforms [Bru07 ] Peter Brucker. Scheduling Algorithms [BS05 ] Ragnhild Blikberg and Tor Sørevik. Load balancing and OpenMP implementation of nested parallelism Scheduling problems in parallel query optimization LogP: towards a realistic model of parallel computation Design, implementation and evaluation of parallel pipelined STAP on parallel computers Optimal processor assignment for a class of pipeline computations Bringing skeletons out of the closet: A pragmatic manifesto for skeletal parallel programming Data driven nets: A maximally concurrent, procedural, parallel process representation for distributed control systems Grid Resource Management, chapter Workflow management in GriPhyN Dennis. First version of a data flow procedure language Scheduling recurrent precedence-constrained task graphs on a symmetric shared-memory multiprocessor Scheduling data flow applications using linear programming Multiobjective scheduling, Grid Technology and Applications IPDPS'2010, the 24th IEEE International Parallel and Distributed Processing Symposium HeteroPar'07, the 6th International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks HCW'08, the 17th International Heterogeneity in Computing Workshop PODS'1995, the 14th ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems PPOPP'1993, the 4th ACM SIGPLAN symposium on Principles and practice of parallel programming ACM. [CLW + 00 ] Alok ChoudharyCNNS94 ] Alok Choudhary, Bhagirath Narahari Symposium on Programming Dennis. Data flow supercomputersPar 2009 Parallel Processing ICPP'2005, the 34th International Conference on Parallel Processing Introduction to SchedulingDRV00 ] Alain Darte, Yves Robert, and Frederic Vivien. Scheduling and Automatic Parallelization, pp.65-99689, 1974.

D. S. Jacob, A. Katz-thomas-fahringer, S. Jugravu, R. Pllana, C. Prodan et al., The Anatomy of the Grid: Enabling Scalable Virtual Organizations Teuta: Tool support for performance modeling of distributed and parallel applications Tools for Program Development and Analysis in Computational Science Theory and practice in parallel job scheduling Bounds for LPT schedules on uniform processors Computers and Intractability Sensitivity analysis of tree scheduling on two machines with communication delays Faster Combinatorial Approximation Algorithm for Scheduling Unrelated Parallel Machines Bounds for certain multiprocessing anomalies Bounds on multiprocessing timing anomalies Exploiting throughput for pipeline execution in streaming image processing applications Optimizing latency under throughput requirements for streaming applications on cluster execution Reliability versus performance for critical applications A Component-Based Framework for the Cell Broadband Engine Biomedical Image Analysis on a Cooperative Cluster of GPUs and Multicores Investigating the Use of GPU-Accelerated Nodes for SAR Image Formation Compile-time scheduling of dynamic constructs in dataflow program graphs Optimization algorithms for exploiting the parallelism-communication tradeoff in pipelined parallelism Mapping a chain task to chained processors Hary and Fusun Ozguner. Precedence-constrained task allocation onto point-to-point networks for pipelined execution Approximation Algorithms for NP-hard problems Bandwidth-aware resource allocation for heterogeneous computing systems to maximize throughput, Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Scientific Programming Journal Concurrency and Computation: Practice and Experience International Conference on Computational Science Proceedings of the Job Scheduling Strategies for Parallel ProcessingGMW05 ] Martin Gairing, Burkhard Monien, and Andreas Woclaw. Automata, Languages and Programming Parallel Processing Cluster Computing Proceedings of 23rd International. Parallel and Distributed Processing Symposium, The 18th Heterogeneous Computing Workshop Proceedings of the 22nd Annual International Conference on Supercomputing , ICS 2008 Proceedings of the IEEE International Conference on Cluster Computing, Workshop on Parallel Programming on Accelerator Clusters (PPAC) VLDB Bhagirath Narahari, and Hyeong-Ah Choi ICPP'2003, the 32th International Conference on Parallel ProcessingHS87 ] Dorit S. Hochbaum and David B. Shmoys. Using dual approximation algorithms for scheduling problems: Practical and theoretical resultsHS88 ] Dorit S. Hochbaum and David B. Shmoys. A polynomial approximation scheme for scheduling on uniform processors: Using the dual approximation approach. SIAM Journal on Computing, pp.219-237143, 1966.

C. Dryad-ravindra-jejurikar, R. Pereira, K. Gupta, J. R. Kennedy, Y. Allen-jihie-kim et al., Approximate algorithms for partitioning problems Leakage aware dynamic voltage scaling for real-time embedded systems Real-time scheduling for pipelined execution of data flow graphs on a realistic multiprocessor architecture Benchmarking and comparison of the task graph scheduling algorithms Static scheduling algorithms for allocating directed task graphs to multiprocessors Optimizing compilers for modern architectures: a dependence-based approach The semantics of simple language for parallel programming A knowledge-based approach to interactive workflow composition, EuroSys'2007, the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems Proceedings of DAC'04, the 41st annual Design Automation Conferencea ACM. [JV96 ] Jon Jonsson and Jonas Vasell ICASSP-96: Proceedings of the 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing IFIP Congress 14th International Conference on Automatic Planning and Scheduling (ICAPS 04)KN10 ] Ekasit Kijsipongse and Sudsanguan Ngamsuriyaroj. Placing pipeline stages on a Grid: Single path and multipath pipeline executionKRC + 99 ] Kathleen Knobe, James M. Rehg, Arun Chauhan, Rishiyur S. Nikhil, and Umakishore Ramachandran. Scheduling constrained dynamic applications on clusters Su- percomputing'1999, the 1999 ACM/IEEE conference on Supercomputing, pp.59-72341, 1974.

. Cheng, J. Michael, M. Litzkow, M. W. Livny, . Mutka et al., Condor-a hunter of idle workstations A mapping methodology for designing software task pipelines for embedded signal processing Approximation algorithms for scheduling unrelated parallel machines, Proceedings of the 8th International Conference on Distributed Computing Systems Proceedings of the Workshop on Embedded HPC Systems and Applications of IPPS/SPDP Dataflow process networks. Proceedings of the IEEE Mathematical ProgrammingLT02 ] Renaud Lepere and Denis Trystram. A new clustering algorithm for large communication delays. In International Parallel and Distributed Processing Symposium (IPDPS'2002, pp.352-361, 1988.

A. Mackenzie-graham, A. Payan, I. D. Dinov, J. D. Van-horn, A. W. Toga-andy et al., Neuroimaging Data Provenance Using the LONI Pipeline Workflow Environment http://msdn.microsoft.com/en- us/devlabs/dd795202.aspx [Mil99 ] Mark P. Mills. The internet begins with coal: A preliminary exploration of the impact of the Internet on electricity consumption: a green policy paper for the Greening Earth Society. Mills-McCarthy & Associates, 1999. [MO95 ] Fredrik Manne and Bjørn Olstad. Efficient partitioning of sequences Daedalus: toward composable multimedia MP-SoC design, Parallel Processing Provenance and Annotation of Data, International Provenance and Annotation Workshop (IPAW)Mic09 ] Microsoft. AXUM webpageNic94 ] David Nicol. Rectilinear partitioning of irregular data parallel computations. Journal on Parallel and Distributed Computing DAC '08: Proceedings of the 45th annual Design Automation Conference, pp.295-304, 1994.

P. P?nar, E. Tabak, and C. Aykanat, One-dimensional partitioning for heterogeneous systems: Theory and practice, Central Institute for Applied Mathematics FOCS 41st Annual Symposium on Foundations of Computer Science ICPP '02: Proceedings of the 2001 International Conference on Parallel ProcessingRei07 ] James Reinders. Intel Threading Building Blocks. O' Reilly, 2007. [RKO + 03RS87 ] Vic J. Rayward-Smith. UET scheduling with interprocessor communication delays, pp.1473-1486, 2000.
DOI : 10.1016/j.jpdc.2008.07.005

J. Vic, F. W. Rayward-smith, G. J. Burton, and . Janacek, Scheduling parallel program assuming preallocation, Theory and its Applications, pp.55-71, 1987.

C. , A. Sussman, J. S. Kong, H. Shimada, ¨. Umit et al., Executing multiple pipelined data analysis operations in the grid Computer-aided prognosis of neuroblastoma on whole-slide images: Classification of stromal development Understanding the behavior and performance of non-blocking communications in MPI, [SKS + 09 ] Olcay Sertel Euro-Par 2004 Parallel ProcessingSRM06 ] Vivy Suhendra, Chandrashekar Raghavan, and Tulika Mitra. Integrated scratchpad memory optimization and task scheduling for MPSoC architectures CASES '06, pp.1-181093, 2002.

G. Subhlok, K. Vondran, C. E. Bouibede-hocine, M. Jr, and R. Ferreira, Optimal mapping of sequences of data parallel tasks Optimal latency-throughput tradeoffs for data parallel pipelines Counting and enumeration complexity with application to multicriteria scheduling A heuristic algorithm for mapping communicating tasks on heterogeneous resources Achieving multi-level parallelism in filter-labeled stream programming model, ACM/IEEE International Conference on Compilers, Architecture, and Synthesis for Embedded Systems PPOPP'1995, the 5th ACM SIGPLAN symposium on Principles and practice of parallel programming ACM. [SV96 ] Jaspal Subhlok and Gary Vondran SPAA'1996, the 8th annual ACM symposium on Parallel algorithms and architectures HCW'1999, the Heterogeneous Computing Workshop ICPP'2008, the 37th International Conference on Parallel Processing Condor and the Grid, pp.134-143, 1995.

T. Tannenbaum, D. Wright, K. Miller, M. Taylor, X. Wu et al., Condor ? a distributed job scheduler Beowulf Cluster Computing with Linux Prophesy: an infrastructure for performance analysis and modeling of parallel and grid applications Toward optimizing latency under throughput constraints for application workflows on clusters Optimizing latency and throughput of application workflows on clusters The recognition of series parallel digraphs Optimizing Supercompilers for Supercomputers The network weather service: a distributed resource performance forecasting service for metacomputing Towards Energy Aware Scheduling for Precedence Constrained Parallel Tasks in a Cluster with DVFS A scheduling model for reduced CPU energy, Grid Computing: Making the Global Infrastructure a Reality Euro-Par 2007 Parallel Processing Press Proceedings of CCGrid'2010, the 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing. [YB05 ] Jia Yu and Rajkumar Buyya. A Taxonomy of Workflow Management Systems for Grid Computing Proceedings of FOCS '95, the 36th Annual Symposium on Foundations of Computer Science. [YKS03 ] Mau-Tsuen Yang, Rangachar Kasturi, and Anand Sivasubramaniam. A Pipeline- Based Approach for Scheduling Video Processing Algorithms on NOW. IEEE Transactions on Parallel and Distributed Systems, pp.13-18, 1982.