The GPU Computing Era, IEEE Micro, vol.30, issue.2, pp.56-69, 2010. ,
DOI : 10.1109/MM.2010.41
Parallel Computing Experiences with CUDA, IEEE Micro, vol.28, issue.4, pp.13-27, 2008. ,
DOI : 10.1109/MM.2008.57
Abstract, Communications in Computational Physics, vol.28, issue.02, pp.285-329, 2014. ,
DOI : 10.1145/365559.365617
Improving Performance of OpenCL on CPUs, pp.1-20, 2012. ,
DOI : 10.1007/978-3-642-28652-0_1
Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007), pp.407-420, 2007. ,
DOI : 10.1109/MICRO.2007.30
Efficiently computing static single assignment form and the control dependence graph, ACM Transactions on Programming Languages and Systems, vol.13, issue.4, pp.451-490, 1991. ,
DOI : 10.1145/115372.115320
Minimal Multi-threading: Finding and Removing Redundant Instructions in Multi-threaded Processors, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, pp.337-348, 2010. ,
DOI : 10.1109/MICRO.2010.41
Thread scheduling and memory coalescing for dynamic vectorization of SPMD workloads, Parallel Computing, vol.40, issue.9, pp.548-558, 2014. ,
DOI : 10.1016/j.parco.2014.03.006
URL : https://hal.archives-ouvertes.fr/hal-01087054
LLVM: A compilation framework for lifelong program analysis & transformation, International Symposium on Code Generation and Optimization, 2004. CGO 2004., pp.75-88, 2004. ,
DOI : 10.1109/CGO.2004.1281665
Compiling C* programs for a hypercube multicomputer, ACM SIGPLAN Notices, vol.23, issue.9, pp.57-65, 1988. ,
DOI : 10.1145/62116.62122
Stack-less SIMT reconvergence at low cost, ENS Lyon, Tech. Rep, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00622654
The PAR- SEC benchmark suite: characterization and architectural implications, PACT, pp.72-81, 2008. ,
A control-structure splitting optimization for GPGPU, Proceedings of the 6th ACM conference on Computing frontiers, CF '09, pp.147-150, 2009. ,
DOI : 10.1145/1531743.1531766
Divergence Analysis and Optimizations, 2011 International Conference on Parallel Architectures and Compilation Techniques, pp.320-329, 2011. ,
DOI : 10.1109/PACT.2011.63
Reducing branch divergence in GPU programs, Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-4, pp.1-3, 2011. ,
DOI : 10.1145/1964179.1964184
Openmp to gpgpu: a compiler framework for automatic translation and optimization, pp.101-110, 2009. ,
Streamlining GPU applications on the fly, Proceedings of the 24th ACM International Conference on Supercomputing, ICS '10, pp.115-126, 2010. ,
DOI : 10.1145/1810085.1810104
On-the-fly elimination of dynamic irregularities for GPU computing, ASPLOS, pp.369-380, 2011. ,
Multithreaded instruction sharing, 2010. ,
SIMD re-convergence at thread frontiers, Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-44 '11, pp.477-488, 2011. ,
DOI : 10.1145/2155620.2155676
Thread fusion, Proceeding of the thirteenth international symposium on Low power electronics and design, ISLPED '08, pp.363-368, 2008. ,
DOI : 10.1145/1393921.1394018