C. Augonnet, S. Thibault, and R. Namyst, Automatic Calibration of Performance Models on Heterogeneous Multicore Architectures, Proceedings of the Euro- Par Workshops, 2009.
DOI : 10.1007/978-3-642-14122-5_9
URL : https://hal.archives-ouvertes.fr/inria-00421333

C. Augonnet, S. Thibault, and R. Namyst, StarPU: a Runtime System for Scheduling Tasks over Accelerator-Based Multicore Machines, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00467677

C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. Concurrency and Computation: Practice and Experience, Euro- Par, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00384363

E. Ayguadé, R. M. Badia, F. D. Igual, J. Labarta, R. Mayo et al., An Extension of the StarSs Programming Model for Platforms with Multiple GPUs, Euro-Par, pp.851-862, 2009.
DOI : 10.1109/TPDS.2003.1214317

R. Dolbeau, S. Bihan, and F. Bodin, HMPP: A hybrid multicore parallel programming environment, 2007.

K. Fatahalian, T. J. Knight, M. Houston, M. Erez, D. Reiter-horn et al., Sequoia: Programming the Memory Hierarchy, ACM/IEEE SC 2006 Conference (SC'06), 2006.
DOI : 10.1109/SC.2006.55

I. Gelado, J. Cabezas, J. E. Stone, S. Patel, N. Navarro et al., An Asymmetric Distributed Shared Memory Model for Heterogeneous Parallel Systems, ASPLOS'10, 2010.

L. Genovese, M. Ospici, T. Deutsch, J. Méhaut, A. Neelov et al., Density functional theory calculation on many-cores hybrid central processing unit-graphic processing unit architectures, The Journal of Chemical Physics, vol.131, issue.3, p.34103, 2009.
DOI : 10.1063/1.3166140

V. Volodymyr, J. Kindratenko, G. Enos, M. T. Shi, G. W. Showerman et al., GPU clusters for high-performance computing, CLUSTER, pp.1-8, 2009.

O. S. Lawlor, Message passing for GPGPU clusters: CudaMPI, 2009 IEEE International Conference on Cluster Computing and Workshops, 2009.
DOI : 10.1109/CLUSTR.2009.5289129
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.539.3749

J. Lee, S. Seo, C. Kim, J. Kim, P. Chun et al., COMIC, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, PACT '08, pp.303-314, 2008.
DOI : 10.1145/1454115.1454157

H. Ltaief, S. Tomov, R. Nath, P. Du, and J. Dongarra, A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators, 2009.
DOI : 10.1007/978-3-642-03869-3_79

V. Marjanovi´cmarjanovi´c, J. Labarta, E. Ayguadé, and M. Valero, Overlapping communication and computation by using a hybrid MPI/SMPSs approach, ICS '10: Proceedings of the 24th ACM International Conference on Supercomputing, pp.5-16, 2010.

M. Ohara, H. Inoue, Y. Sohda, H. Komatsu, and T. Nakatani, MPI microtask for programming the Cell Broadband Engine??? processor, IBM Systems Journal, vol.45, issue.1, 2006.
DOI : 10.1147/sj.451.0085

H. Topcuoglu, S. Hariri, and M. Wu, Performanceeffective and low-complexity task scheduling for heterogeneous computing. Parallel and Distributed Systems, IEEE Transactions on, vol.13, issue.3, pp.260-274, 2002.