J. Nickolls and W. J. Dally, The GPU Computing Era, IEEE Micro, vol.30, issue.2, pp.56-69, 2010.
DOI : 10.1109/MM.2010.41

M. Garland, Parallel Computing Experiences with CUDA, IEEE Micro, vol.28, issue.4, pp.13-27, 2008.
DOI : 10.1109/MM.2008.57

C. A. Navarro, N. Hitschfeld-kahler, and L. Mateu, Abstract, Communications in Computational Physics, vol.28, issue.02, pp.285-329, 2014.
DOI : 10.1145/365559.365617

R. Karrenberg and S. Hack, Improving Performance of OpenCL on CPUs, pp.1-20, 2012.
DOI : 10.1007/978-3-642-28652-0_1

W. W. Fung, I. Sham, G. Yuan, and T. M. Aamodt, Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007), pp.407-420, 2007.
DOI : 10.1109/MICRO.2007.30

R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck, Efficiently computing static single assignment form and the control dependence graph, ACM Transactions on Programming Languages and Systems, vol.13, issue.4, pp.451-490, 1991.
DOI : 10.1145/115372.115320

G. Long, D. Franklin, S. Biswas, P. Ortiz, J. Oberg et al., Minimal Multi-threading: Finding and Removing Redundant Instructions in Multi-threaded Processors, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, pp.337-348, 2010.
DOI : 10.1109/MICRO.2010.41

T. Milanez, S. Collange, F. M. Pereira, W. Meira, and R. Ferreira, Thread scheduling and memory coalescing for dynamic vectorization of SPMD workloads, Parallel Computing, vol.40, issue.9, pp.548-558, 2014.
DOI : 10.1016/j.parco.2014.03.006
URL : https://hal.archives-ouvertes.fr/hal-01087054

C. Lattner and V. S. Adve, LLVM: A compilation framework for lifelong program analysis & transformation, International Symposium on Code Generation and Optimization, 2004. CGO 2004., pp.75-88, 2004.
DOI : 10.1109/CGO.2004.1281665

M. J. Quinn, P. J. Hatcher, and K. C. Jourdenais, Compiling C* programs for a hypercube multicomputer, ACM SIGPLAN Notices, vol.23, issue.9, pp.57-65, 1988.
DOI : 10.1145/62116.62122

S. Collange, Stack-less SIMT reconvergence at low cost, ENS Lyon, Tech. Rep, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00622654

C. Bienia, S. Kumar, J. P. Singh, and K. Li, The PAR- SEC benchmark suite: characterization and architectural implications, PACT, pp.72-81, 2008.

S. Carrillo, J. Siegel, and X. Li, A control-structure splitting optimization for GPGPU, Proceedings of the 6th ACM conference on Computing frontiers, CF '09, pp.147-150, 2009.
DOI : 10.1145/1531743.1531766

B. Coutinho, D. Sampaio, F. M. Pereira, and W. M. Jr, Divergence Analysis and Optimizations, 2011 International Conference on Parallel Architectures and Compilation Techniques, pp.320-329, 2011.
DOI : 10.1109/PACT.2011.63

T. D. Han and T. S. Abdelrahman, Reducing branch divergence in GPU programs, Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-4, pp.1-3, 2011.
DOI : 10.1145/1964179.1964184

S. Lee, S. Min, and R. E. Ppopp, Openmp to gpgpu: a compiler framework for automatic translation and optimization, pp.101-110, 2009.

E. Z. Zhang, Y. Jiang, Z. Guo, and X. Shen, Streamlining GPU applications on the fly, Proceedings of the 24th ACM International Conference on Supercomputing, ICS '10, pp.115-126, 2010.
DOI : 10.1145/1810085.1810104

E. Z. Zhang, Y. Jiang, Z. Guo, K. Tian, and X. Shen, On-the-fly elimination of dynamic irregularities for GPU computing, ASPLOS, pp.369-380, 2011.

M. Dechene, E. Forbes, and E. Rotenberg, Multithreaded instruction sharing, 2010.

G. Diamos, B. Ashbaugh, S. Maiyuran, A. Kerr, H. Wu et al., SIMD re-convergence at thread frontiers, Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-44 '11, pp.477-488, 2011.
DOI : 10.1145/2155620.2155676

J. González, Q. Cai, P. Chaparro, G. Magklis, R. Rakvic et al., Thread fusion, Proceeding of the thirteenth international symposium on Low power electronics and design, ISLPED '08, pp.363-368, 2008.
DOI : 10.1145/1393921.1394018