U. A. Acar, G. E. Blelloch, and R. D. Blumofe, The data locality of work stealing, Proc. of ACM SPAA. pp. 1?12. SPAA '00, 2000.

E. Agullo, C. Augonnet, J. Dongarra, M. Faverge, J. Langou et al., LU factorization for accelerator-based systems, 2011 9th IEEE/ACS International Conference on Computer Systems and Applications (AICCSA), pp.217-224, 2011.
DOI : 10.1109/AICCSA.2011.6126599
URL : https://hal.archives-ouvertes.fr/hal-00654193

C. Augonnet, S. Thibault, R. Namyst, and P. A. Wacrenier, StarPU: a unified platform for task scheduling on heterogeneous multicore architectures, Concurrency and Computation: Practice and Experience, vol.23, issue.4, pp.187-198, 2011.
DOI : 10.1002/cpe.1631
URL : https://hal.archives-ouvertes.fr/inria-00384363

E. Ayguadé, R. Badia, F. Igual, J. Labarta, R. Mayo et al., An Extension of the StarSs Programming Model for Platforms with Multiple GPUs, Proc. of Euro-Par, pp.851-862, 2009.
DOI : 10.1109/TPDS.2003.1214317

C. Boeres, G. Chochia, and P. Thanisch, On the scope of applicability of the ETF algorithm, Proc. of the 2nd International Workshop on Parallel Algorithms for Irregularly Structured Problems. pp. 159?164. IRREGULAR '95, 1995.
DOI : 10.1007/3-540-60321-2_13

F. Broquedis, T. Gautier, and V. Danjean, libKOMP, an Efficient OpenMP Runtime System for Both Fork-Join and Data Flow Paradigms, pp.102-115
DOI : 10.1007/978-3-642-30961-8_8
URL : https://hal.archives-ouvertes.fr/hal-00796253

J. Bueno, J. Planas, A. Duran, R. M. Badia, X. Martorell et al., Productive Programming of GPU Clusters with OmpSs, 2012 IEEE 26th International Parallel and Distributed Processing Symposium, 2012.
DOI : 10.1109/IPDPS.2012.58

A. Buttari, J. Langou, J. Kurzak, and J. Dongarra, A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009.
DOI : 10.1016/j.parco.2008.10.002

M. Frigo, C. E. Leiserson, and K. H. Randall, The implementation of the Cilk-5 multithreaded language, Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation. pp. 212?223. PLDI '98, 1998.

F. Galilée, J. L. Roch, G. G. Cavalheiro, and M. Doreille, Athapascan-1: On-line building data flow graph in a parallel language, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192), pp.88-95, 1998.
DOI : 10.1109/PACT.1998.727176

T. Gautier, X. Besseron, and L. Pigeon, KAAPI, Proceedings of the 2007 international workshop on Parallel symbolic computation, PASCO '07, 2007.
DOI : 10.1145/1278177.1278182
URL : https://hal.archives-ouvertes.fr/hal-00647474

Y. Guo, J. Zhao, V. Cave, and V. Sarkar, Slaw: A scalable locality-aware adaptive work-stealing scheduler, Proc. of IEEE IPDPS, pp.1-12, 2010.

E. Hermann, B. Raffin, F. C. Faure, T. Gautier, and J. Allard, Multi-GPU and Multi-CPU Parallelization for Interactive Physics Simulations, Proc. of Euro-Par, pp.235-246, 2010.
DOI : 10.1007/978-3-642-15291-7_23
URL : https://hal.archives-ouvertes.fr/inria-00502448

J. Kurzak, H. Ltaief, J. Dongarra, and R. M. Badia, Scheduling dense linear algebra operations on multicore processors, Concurrency and Computation: Practice and Experience, vol.35, issue.2, pp.15-44, 2010.
DOI : 10.1145/1377612.1377615
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.177.3294

G. Quintana-ortí, F. D. Igual, E. S. Quintana-ortí, and R. A. Van-de-geijn, Solving dense linear systems on platforms with multiple hardware accelerators, ACM SIGPLAN Notices, vol.44, issue.4, pp.121-130, 2009.
DOI : 10.1145/1594835.1504196

A. Robison, M. Voss, and A. Kukanov, Optimization via Reflection on Work Stealing in TBB, 2008 IEEE International Symposium on Parallel and Distributed Processing, pp.1-8, 2008.
DOI : 10.1109/IPDPS.2008.4536188

F. Song and J. Dongarra, A scalable framework for heterogeneous GPU-based clusters, Proceedinbgs of the 24th ACM symposium on Parallelism in algorithms and architectures, SPAA '12, pp.91-100, 2012.
DOI : 10.1145/2312005.2312025

S. Tomov, J. Dongarra, and M. Baboulin, Towards dense linear algebra for hybrid GPU accelerated manycore systems, Parallel Computing, vol.36, issue.5-6, pp.5-6, 2010.
DOI : 10.1016/j.parco.2009.12.005