O. Beaumont, A. Legrand, L. Marchal, and Y. Robert, Assessing the impact and limits of steady-state scheduling for mixed task and data parallelism on heterogeneous platforms, HeteroPar'2004: International Conference on Heterogeneous Computing, jointly published with ISPDC'2004: International Symposium on Parallel and Distributed Computing, pp.296-302, 2004.
URL : https://hal.archives-ouvertes.fr/hal-00789444

M. D. Beynon, T. Kurc, A. Sussman, and J. Saltz, Optimizing execution of componentbased applications using group instances, Future Generation Computer Systems, vol.18, issue.4, pp.435-448, 2002.

M. Beynon, A. Sussman, U. Catalyurek, T. Kurc, and J. Saltz, Performance optimization for data intensive grid applications, PProceedings of the Third Annual International Workshop on Active Middleware Services (AMS'01), 2001.

P. B. Bhat, C. S. Raghavendra, and V. K. Prasanna, Efficient collective communication in distributed heterogeneous systems, ICDCS'99 19th International Conference on Distributed Computing Systems, pp.15-24, 1999.

P. B. Bhat, C. S. Raghavendra, and V. K. Prasanna, Efficient collective communication in distributed heterogeneous systems, Journal of Parallel and Distributed Computing, vol.63, pp.251-263, 2003.

S. H. Bokhari, Partitioning problems in parallel, pipeline, and distributed computing, IEEE Trans. Computers, vol.37, issue.1, pp.48-57, 1988.

M. Cole, Bringing Skeletons out of the Closet: A Pragmatic Manifesto for Skeletal Parallel Programming, Parallel Computing, vol.30, issue.3, pp.389-406, 2004.

M. R. Garey and D. S. Johnson, Computers and Intractability, a Guide to the Theory of NP-Completeness, 1979.

P. Hansen and K. Lih, Improved algorithms for partitioning problems in parallel, pipeline, and distributed computing, IEEE Trans. Computers, vol.41, issue.6, pp.769-771, 1992.

M. Iqbal, Approximate algorithms for partitioning problems, Int. J. Parallel Programming, vol.20, issue.5, pp.341-361, 1991.

M. Iqbal and S. H. Bokhari, Efficient algorithms for a class of partitioning problems, IEEE Trans. Parallel and Distrbuted Systems, vol.6, issue.2, pp.170-175, 1995.

B. Olstad and F. Manne, Efficient partitioning of sequences, IEEE Transactions on Computers, vol.44, issue.11, pp.1322-1326, 1995.

A. Pinar and C. Aykanat, Fast optimal load balancing algorithms for 1D partitioning, J. Parallel Distributed Computing, vol.64, issue.8, pp.974-996, 2004.

F. A. Rabhi and S. Gorlatch, Patterns and Skeletons for Parallel and Distributed Computing, 2002.

T. Saif and M. Parashar, Understanding the behavior and performance of non-blocking communications in MPI, Proceedings of Euro-Par, vol.3149, pp.173-182, 2004.

B. A. Shirazi, A. R. Hurson, and K. M. Kavi, Scheduling and load balancing in parallel and distributed systems, 1995.

M. Spencer, R. Ferreira, M. Beynon, T. Kurc, U. Catalyurek et al., Executing multiple pipelined data analysis operations in the grid, 2002 ACM/IEEE Supercomputing Conference, 2002.

J. Subhlok and G. Vondran, Optimal mapping of sequences of data parallel tasks, Proc. 5th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP'95, pp.134-143, 1995.

J. Subhlok and G. Vondran, Optimal latency-throughput tradeoffs for data parallel pipelines, ACM Symposium on Parallel Algorithms and Architectures SPAA'96, pp.62-71, 1996.

K. Taura and A. A. Chien, A heuristic algorithm for mapping communicating tasks on heterogeneous resources, Heterogeneous Computing Workshop, pp.102-115, 2000.

N. Vydyanathan, U. Catalyurek, T. Kurc, P. Saddayappan, and J. Saltz, An approach for optimizing latency under throughput constraints for application workflows on clusters, 2007.

, Unité de recherche INRIA Rhône-Alpes 655, avenue de l'Europe -38334 Montbonnot

. Unité-de-recherche-inria-futurs, Parc Club Orsay Université -ZAC des Vignes 4, rue Jacques Monod -91893 ORSAY Cedex

. Unité-de-recherche-inria-lorraine, LORIA, Technopôle de Nancy-Brabois -Campus scientifique 615, rue du Jardin Botanique -BP 101 -54602