Minimal Multi-threading: Finding and Removing Redundant Instructions in Multi-threaded Processors, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, pp.337-348, 2010. ,
DOI : 10.1109/MICRO.2010.41
Compiling C* programs for a hypercube multicomputer, ACM SIGPLAN Notices, vol.23, issue.9, pp.57-65, 1988. ,
DOI : 10.1145/62116.62122
System and method for managing divergent threads in SIMD architecture, 2008. ,
The PARSEC benchmark suite, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, PACT '08 ,
DOI : 10.1145/1454115.1454128
A single-program-multiple-data computational model for EPEX/FORTRAN, Parallel Computing, vol.7, issue.1, pp.11-24, 1988. ,
DOI : 10.1016/0167-8191(88)90094-4
SIMD reconvergence at thread frontiers, MICRO, 2011. ,
DOI : 10.1145/2155620.2155676
Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators, ISCA. ACM, pp.129-140, 2011. ,
Simultaneous multithreading, ACM SIGARCH Computer Architecture News, vol.23, issue.2, pp.392-403, 1995. ,
DOI : 10.1145/225830.224449
Exploiting choice, ACM SIGARCH Computer Architecture News, vol.24, issue.2, pp.191-202, 1996. ,
DOI : 10.1145/232974.232993
Thread fusion, Proceeding of the thirteenth international symposium on Low power electronics and design, ISLPED '08, pp.363-368, 2008. ,
DOI : 10.1145/1393921.1394018
An approximation algorithm for the shortest common supersequence problem, Proceedings of the 2001 ACM symposium on Applied computing , SAC '01, pp.56-60, 2001. ,
DOI : 10.1145/372202.372275
Online computation and competitive analysis, 1998. ,
Computer Architecture: A Quantitative Approach, 2003. ,
Dynamic warp subdivision for integrated branch and memory divergence tolerance, ISCA ,
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming , PPoPP '08, pp.73-82, 2008. ,
DOI : 10.1145/1345206.1345220
A GPGPU compiler for memory optimization and parallelism management, PLDI. ACM, pp.86-97, 2010. ,
Performance in GPU architectures: Potentials and distances, pp.75-81, 2011. ,
Dynamic Detection of Uniform and Affine Vectors in GPGPU Computations, pp.46-55, 2009. ,
DOI : 10.1007/978-3-642-14122-5_8
URL : https://hal.archives-ouvertes.fr/hal-00396719