M. Aldinucci, M. Danelutto, P. Kilpatrick, and M. Torquati, TARGETING HETEROGENEOUS ARCHITECTURES VIA MACRO DATA FLOW, Intl. Workshop on High-level Programming for Heterogeneous and Hierarchical Parallel Systems (HLPGPU), HiPEAC, pp.1-6, 2012.
DOI : 10.1142/S0129626412400063

M. Aldinucci, M. Danelutto, P. Kilpatrick, and M. Torquati, Fastflow: high-level and efficient streaming on multi-core, Programming Multi-core and Many-core Computing Systems, Parallel and Distributed Computing, p.13, 2013.
DOI : 10.1002/9781119332015.ch13

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.228.2451

C. Augonnet, J. Clet-ortega, S. Thibault, and R. Namyst, Data-Aware task scheduling on multiaccelerator based platforms, IEEE 16th International Conference on Parallel and Distributed Systems (ICPADS), IEEE, pp.2010-291, 2010.
DOI : 10.1109/icpads.2010.129

URL : https://hal.archives-ouvertes.fr/inria-00523937

E. Ayguadé, R. Badia, F. Igual, J. Labarta, R. Mayo et al., An Extension of the StarSs Programming Model for Platforms with Multiple GPUs, pp.851-862, 2009.
DOI : 10.1109/TPDS.2003.1214317

A. Binotto, B. Pedras, M. Gotz, A. Kuijper, C. Pereira et al., Effective Dynamic Scheduling on Heterogeneous Multi/Manycore Desktop Platforms, 2010 22nd International Symposium on Computer Architecture and High Performance Computing Workshops, pp.37-42, 2010.
DOI : 10.1109/SBAC-PADW.2010.6

L. Chen, O. Villa, and G. Gao, Exploring Fine-Grained Task-Based Execution on Multi-GPU Systems, 2011 IEEE International Conference on Cluster Computing, pp.386-39450, 2011.
DOI : 10.1109/CLUSTER.2011.50

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.651.7760

A. Danalis, G. Marin, C. Mccurdy, J. Meredith, P. Roth et al., The Scalable Heterogeneous Computing (SHOC) benchmark suite, Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, GPGPU '10, 2010.
DOI : 10.1145/1735688.1735702

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.187.5036

G. Diamos and S. Yalamanchili, Harmony, Proceedings of the 17th international symposium on High performance distributed computing, HPDC '08, pp.197-200, 2008.
DOI : 10.1145/1383422.1383447

R. Dolbeau, F. Qiu, A. Kaufman, and S. Yoakum-stover, Hmpp : A hybrid multicore parallel. First Workshop on General Purpose Processing on Graphics Processing Units pp 1?5, URL http://www.capsentreprise.com/upload, GPU cluster for high performance computing . In: Supercomputing Proceedings of the ACM, pp.47-4726, 1109.

T. Gautier, X. Besseron, and L. Pigeon, KAAPI, Proceedings of the 2007 international workshop on Parallel symbolic computation, PASCO '07, 2007.
DOI : 10.1145/1278177.1278182

URL : https://hal.archives-ouvertes.fr/hal-00647474

T. Grandpierre, C. Lavarenne, and Y. Sorel, Optimized rapid prototyping for real-time embedded heterogeneous multiprocessors, Proceedings of the seventh international workshop on Hardware/software codesign , CODES '99, pp.74-78, 1999.
DOI : 10.1145/301177.301489

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.42.3154

W. Gropp, E. Lusk, and A. Skjellum, Using MPI): portable parallel programming with the message-passing interface, 1999.

T. Han and T. Abdelrahman, CUDA, Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-2, pp.52-61, 2009.
DOI : 10.1145/1513895.1513902

C. Hsu, J. Pino, and S. Bhattacharyya, Multithreaded simulation for synchronous dataflow graphs, Proceedings of the 45th annual Design Automation Conference, pp.331-336, 2008.
DOI : 10.1145/1391469.1391553

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.143.3529

L. Itti, C. Koch, and E. Niebur, A model of saliencybased visual attention for rapid scene analysis, IEEE Trans Pattern Anal Mach Intell, vol.2034, pp.1254-1259, 1998.

K. Karimi, N. Dickson, and F. Hamze, A performance comparison of cuda and opencl, p.2581, 2010.

T. Li, V. Narayana, and T. El-ghazawi, A Static Task Scheduling Framework for Independent Tasks Accelerated Using a Shared Graphics Processing Unit, 2011 IEEE 17th International Conference on Parallel and Distributed Systems, pp.88-9513, 2011.
DOI : 10.1109/ICPADS.2011.13

M. Linderman, J. Collins, H. Wang, and T. Meng, Merge, ACM SIGOPS Operating Systems Review, vol.42, issue.2, pp.287-296, 2008.
DOI : 10.1145/1353535.1346318

S. Marat, H. Phuoc, T. Granjon, L. Guyader, N. Pellerin et al., Modelling Spatio-Temporal Saliency to Predict Gaze Direction for??Short Videos, International Journal of Computer Vision, vol.15, issue.3, pp.231-243, 2009.
DOI : 10.1007/s11263-009-0215-3

URL : https://hal.archives-ouvertes.fr/hal-00368496

R. Membarth, F. Hannig, J. Teich, M. Korner, and W. Eckert, Frameworks for GPU Accelerators: A comprehensive evaluation using 2D/3D image registration, 2011 IEEE 9th Symposium on Application Specific Processors (SASP), pp.78-815941083, 2011.
DOI : 10.1109/SASP.2011.5941083

M. Ospici, D. Komatitsch, J. Mehaut, and T. Deutsch, SGPU 2: a runtime system for using of large applications on clusters of hybrid nodes, Second Workshop on Hybrid Multi-core Computing, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00788789

Y. Ou, H. Chen, and L. Lai, A dynamic load balance on GPU cluster for fork-join search, 2011 IEEE International Conference on Cloud Computing and Intelligence Systems, pp.592-5966045138, 2011.
DOI : 10.1109/CCIS.2011.6045138

A. Rahman, D. Houzet, D. Pellerin, S. Marat, and N. Guyader, Parallel implementation of a spatiotemporal visual saliency model Journal of Real- Time Image Processing 6 special issue, Visual Saliency Model on Multi-GPU. In: GPU Computing Gems Emerald Edition, pp.3-14, 2010.

A. Sb??rleasb??rlea, Y. Zou, Z. Budimlíc, J. Cong, and V. Sarkar, Mapping a data-flow programming model onto heterogeneous platforms, Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers , Tools and Theory for Embedded Systems, pp.61-702248428, 2012.

J. Serra, Image Analysis and Mathematical Morphology, 1983.

T. Stefanov, C. Zissulescu, A. Turjan, B. Kienhuis, and E. Deprettere, System design using kahn process networks: The compaan/laura approach, Proceedings of the conference on Design, automation and test in Europe -Volume, p.340, 2004.

M. Wolfe, Implementing the PGI Accelerator model, Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, GPGPU '10, pp.43-50, 2010.
DOI : 10.1145/1735688.1735697